In Pursuit Of Laziness: Programming

Showing posts with label Programming. Show all posts

Friday, February 20, 2015

Thoughts of a Rustacean learning Go

So as many of you may know, I really like Rust and have been programming in it for nearly a year now.

Recently, for a course I had to use Go. This was an interesting opportunity; Rust and Go have been compared a lot as the "hot new languages", and finally I'd get to see the other side of the argument.

Before I get into the experience, let me preface this by mentioning that Rust and Go don't exactly target the same audiences. Go is garbage collected and is okay with losing out on some performance for ergonomics; whereas Rust tries to keep everything as a compile time check as much as possible. This makes Rust much more useful for lower level applications.

In my specific situation, however, I was playing around with distributed systems via threads (or goroutines), so this fit perfectly into the area of applicability of both languages.

This post isn't exactly intended to be a comparison between the two. I understand that as a newbie at Go, I'll be trying to do things the wrong way and make bad conclusions off of this. My way of coding may not be the "Go way" (I'm mostly carrying over my Rust style to my Go code since I don't know better); so everything may seem like a hack to me. Please keep this in mind whilst reading the post, and feel free to let me know the "Go way" of doing the things I was stumbling with.

This is more of a sketch of my experiences with the language, specifically from the point of view of someone coming from Rust; used to the Rusty way of doing things. It might be useful as an advisory to Rustaceans thinking about trying the language out, and what to expect. Please don't take this as an attack on the language.

What I liked

Despite the performance costs, having a GC at your disposal after using Rust for very long is quite liberating. For a while my internalized borrow checker would throw red flags on me tossing around data indiscriminately, but I learned to ignore it as far as Go code goes. I was able to quickly share state via pointers without worrying about safety, which was quite useful.

Having channels as part of the language itself was also quite ergonomic. data <- chan and chan <- data syntax is fun to use, and whilst it's not very different from .send() and .recv() in Rust, I found it surprisingly easy to read. Initially I got confused often by which side the channel was, but after a while I got used to it. It also has an in built select block for selecting over channels (Rust has a macro).

gofmt. The Go style of coding is different from the Rust one (tabs vs spaces, how declarations look), but I continued to use the Rust style because of the muscle memory (also too lazy to change the settings in my editor). gofmt made life easy since I could just run it in a directory and it would fix everything. Eventually I was able to learn the proper style by watching my code get corrected. I'd love to see a rustfmt, in fact, this is one of the proposed Summer of Code projects under Rust!

Go is great for debugging programs with multiple threads, too. It can detect deadlocks and post traces for the threads (with metadata including what code the thread was spawned from, as well as its current state). It also posts such traces when the program crashes. These are great and saved me tons of time whilst debugging my code (which at times had all sorts of cross interactions between more than ten goroutines in the tests). Without a green threading framework, I'm not sure how easy it will be to integrate this into Rust (for debug builds, obviously), but I'd certainly like it to be.

Go has really great ~~green threads~~ goroutines. They're rather efficient (I can spawn a thousand and it schedules them nicely), and easy to use.

Edit: Andrew Gallant reminded me about Go's testing support, which I'd intended to write about but forgot.

Go has really good in built support and tooling for tests (Rust does too). I enjoyed writing tests in Go quite a bit due to this.

What I didn't like

Sadly, there are a lot of things here, but bear in mind what I mentioned above about me being new to Go and not yet familiar with the "Go way" of doing things.

No enums

Rust has enums, which are basically tagged unions. Different variants can contain different types of data, so we can have, for example:

enum Shape {
    Rectangle(Point, Point),
    Circle(Point, u8),
    Triangle(Point, Point, Point)
}

and when matching/destructuring, you get type-safe access to the contents of the variant.

This is extremely useful for sending typed messages across channels. In this model. For example, in Servo we use such an enum for sending details about the progress of a fetch to the corresponding XHR object. Another such enum is used for communication between the constellation and the compositor/script.

This gives us a great degree of type safety; I can send messages with different data within them, however I can only send messages that the other end will know how to handle since they must all be of the type of the message enum.

In Go there's no obvious way to get this. On the other hand, Go has the type called interface {} which is similar to Box<Any> in Rust or Object in Java. This is a pointer to any type, with the ability to match on its type. As a Rustacean I felt incredibly dirty using this, since I expected that there would be an additional vtable overhead. Besides, this works for any type, so I can always accidentally send a message of the wrong type through a channel and it'll end up crashing the other end at runtime since it hit a default: case.

Of course, I could implement a custom interface MyMessage on the various types, but this will behave exactly like interface{} (implemented on all types) unless I add a dummy method to it, which seems hackish. This brings me to my next point:

Smart interfaces

This is something many would consider a feature in Go, but from the point of view of a Rustacean, I'm rather annoyed by this.

In Go, interfaces get implemented automatically if a type has methods of a matching signature. So an interface with no methods is equivalent to interface{}; and will be implemented on all types automatically. This means that we can't define "marker traits" like in Rust that add a simple layer of type safety over methods. It also means that interfaces can only be used to talk of code level behavior, not higher level abstractions. For example, in Rust we have the Eq trait, which uses the same method as PartialEq for equality (eq(&self, &other)), and the behavior of that method is exactly the same, however the two traits mean fundamentally different things: A type implementing PartialEq has a normal equivalence relation, whilst one that also implements Eq has a full equivalence relation. From the point of view of the code, there's no difference between their behavior. But as a programmer, I can now write code that only accepts types with a full equivalence relation, and exploit that guarantee to optimize my code.

Again, having interfaces be autoimplemented on the basis of the method signature is a rather ergonomic feature in my opinion and it reduces boilerplate. It's just not what I'm used to and it restricts me from writing certain types of code.

Packages and imports

Go puts severe restrictions on where I can put my files. All files in a folder are namespaced into the same package (if you define multiple packages in one folder it errors out). There's no way to specify portable relative paths for importing packages either. To use a package defined in an adjacent folder, I had to do this, whereas in Rust (well, Cargo), it is easy to specify relative paths to packages (crates) like so. The import also only worked if I was developing from within my $GOPATH, so my code now resides within $GOPATH/src/github.com/Manishearth/cs733/; and I can't easily work on it elsewhere without pushing and running go get everytime.

Rust's module system does take hints from the file structure, and it can get confusing, however the behavior can be nearly arbitrarily overridden if necessary (you can even do scary things like this).

Documentation

Rust's libraries aren't yet well documented, agreed. But this is mostly because the libraries are still in flux and will be documented once they settle. We even have the awesome Steve Klabnik working on improving our documentation everywhere. And in general caveats are mentioned where important, even in unstable libraries.

Go, on the other hand, has stable libraries, yet the documentation seems skimpy at places. For example, for the methods which read till a delimiter in bufio, it was rather confusing if they only return what has been buffered till the call, or block until the delimiter is found. Similarly, when it comes to I/O, the blocking/non-blocking behavior really should be explicit; similar to what Sender and Receiver do in their documentation.

Generics

This is a rather common gripe -- Go doesn't have any generics aside from its builtins (chans, arrays, slices, and maps can be strongly typed). Like my other points about enums and interfaces, we lose out on the ability for advanced type safety here.

Overall it seems like Go doesn't really aim for type safe abstractions, preferring runtime matching of types. That's a valid choice to make, though from a Rust background I'm not so fond of it.

Visibility

Visibility (public/private) is done via the capitalization of the field, method, or type name. This sort of restriction doesn't hinder usability, but it's quite annoying.

On the other hand, Rust has a keyword for exporting things, and whilst it has style recommendations for the capitalization of variable names (for good reason -- you don't want to accidentally replace an enum variant with a wildcard-esque binding in a match, for example), it doesn't error on them or change the semantics in any way, just emits a warning. On the other hand, in Go the item suddenly becomes private.

Conclusion

A major recurring point is that Go seems to advocate runtime over compile time checking, something which is totally opposite to what Rust does. This is not just visible in library/language features (like the GC), but also in the tools that are provided — as mentioned above, Go does not give good tools for creating type safe abstractions, and the programmer must add dynamic matching over types to overcome this. This is similar (though not the same) as what languages like Python and Javascript advocate, however these aren't generally interpreted, not compiled (and come with the benefits of being interpreted), so there's a good tradeoff.

Go isn't a language I intend to use for personal projects in the near future. I liked it, but there is an overhead (time) to learning the "Go way" of doing things, and I'd prefer to use languages I already am familiar with for this. This isn't a fault of the language, it's just that I'm coming from a different paradigm and would rather not spend the time adjusting, especially since I already know languages which work equally well (or better) to its various fields of application.

I highly suggest you at least try it once, though!

Wednesday, February 18, 2015

Superfish, and why you should be worried

I'll be updating this post as I get more information .

For non-techies, skip to the last paragraph of the post for instructions on how to get rid of this adware.

Today a rather severe vulnerability on certain Lenovo laptops was discovered.

Software (well, adware) called "Superfish" came preinstalled on some of their machines (it seems like it's the Yoga series).

The software ostensibly scans images of products on the web to provide the user with alternative (perhaps cheaper) offers. Sounds like a mix of annoying and slightly useful, right?

Except to achieve this, they do something extremely unsafe. They install a new root CA certificate into the Windows certificate store. The software works via a local proxy server which does a man in the middle attack on your web requests. Of course, most websites of importance these days use HTTPS, so to be able to successfully decrypt and inject content on these, the proxy needs to have the ability to issue arbitrary certificates. It's a single certificate as far as we can tell (you can find it here in plaintext form)

It also leaves the certificate in the system even if the user does not accept the terms of use.

This means that a nontrivial portion of the population has a untrusted certificate on their machines.

It's actually worse than that, because to execute the MITM attack, the proxy server should have the private key for that certificate.

So a nontrivial portion of the population has the private key to said untrusted certificate. Anyone owning one of these laptops who has some reverse engineering skills has the ability to intercept, modify, and duplicate the connections of anyone else owning one of these laptops. Bank logins, email, credit card numbers, everything. One usually needs physical proximity or control of a network to do this right, but it's quite feasible that this key could be sold to someone who has this level of access (e.g. a secretly evil ISP). Update: The key is now publicly known

This is really bad.

Installing random crapware on laptops is pretty much the norm now; and that's not the issue. Installing crapware which causes a huge security vulnerability? No thanks. What's especially annoying is their attitude towards it; they haven't even acknowledged the security hole they've caused.

The EFF has a rather nice article on this here.

As far as we can tell, Firefox isn't affected by this since it maintains its own root CA store, however we are still trying to verify this, and will see if we can block it in case Firefox is affected, for example, if Superfish installs the cert in Firefox as well. If you have an affected system and can provide information about this, feel free to comment on that bug or tweet at @ManishEarth.

~~The application seems to detect Firefox and install some add ons as well as the certificate. We're looking into this, further insights would be valuable.~~

Superfish does NOT infect Firefox.

Chrome uses the OS's root CA store, so it is affected. So is IE.

It turns out that internally they're using something called Komodia to generate the MITM; and Komodia uses a similar (broken) framework everywhere -- the private key is bundled with it; and the password is "komodia".

If you have friends at Microsoft who can look into this, please see if a hotfix can be pushed, blacklisting the certificate. Whilst it is possible for an "arms race" to happen between Superfish and Windows, it's unlikely since there's more scrutiny now and it would just end up creating more trouble for Superfish. The main concern right now is that rogue root CA cert is installed on many laptops, and the privkey is out there too.

Update: Microsoft has pulled the program and certificates via Windows Defender! Yay! Firefox is probably going to follow suit -- now that the program itself should be gone, blacklisting the certificate won't make infected users have unusable browsers.

If you own a Lenovo laptop that came preinstalled with Windows (especially one of these models), please check in your task manager for an app called "VisualDiscovery" or "Superfish". Here's a small guide on how to do this. It's slightly outdated, but the section on uninstalling the program itself should work. Then, follow the steps here to remove all certificates with the name "Superfish" from the root store. Then go and change all your passwords; and check your bank history amongst other things (/email/paypal/etc) for any suspicious transactions. Chances are that you haven't been targeted, but it's good to be sure.

Alternatively, open Windows Defender, update and scan. There are more methods here

Thursday, January 1, 2015

Mozlandia!

Three weeks ago I was in Portland for Mozilla's Portland coincidental work week, dubbed "Mozlandia".

In Mozilla, a lot of the teams have members scattered across the globe. Work weeks are a way for team members to discuss/hack together and get to know each other better. This one was a bit special; it was all Mozilla teams having a simultaneous work week in Portland, so that the teams get a chance to work with each other.

This was my first large Mozilla event outside of the country. I was there as a member (volunteer) of the Servo team, and it was my first time meeting almost everyone there. This meant that I got to enjoy the experience of seeing my perception of various people move from an amorphous blob of username (in some cases, a face too) to an actual human being :P

I've always thought that I'm able to interact well with people online — perhaps even better than I do in person!. I've spent a lot of time with various online communities (Wikipedia, Stack Exchange, Mozilla), and I've never had trouble communicating there. However, meeting everyone in person was a really awesome experience and I was pleasantly surprised to realize that this makes it easier for me to interact with them online; something I didn't think was possible.

Initially, I'd felt a bit skeptical about the "coincidental" bit of the work week; while I wanted to meet Mozillians outside of my team, I was worried that with all the cross team discussions (and other activities) we wouldn't really get any time to focus on Servo. However, it turned out great — the cross team discussions were very productive and we got lots of time to ourselves as well.

On the first two days there were all-hands sessions (talks) in the morning. These were quite insightful, clearing up many question on the future goals of Mozilla, and being inspiring in general. Brian Muirhead's guest talk about landing a rover on Mars was particularly enjoyable. A rather interesting thing was brought up in Darren Herman's talk — Mozilla is going to start trying to make advertising on the Internet more enjoyable for consumers, rather than trying to just fight it outright. I'm not entirely sure what I feel about this, but they seem to be on the right track with the sponsored tiles on the new tab page; these tiles can be hidden/changed/pinned and aren't obtrusive at all (nor do they detract from the experience of surfing since they're on a page which used to be blank).

I personally did a variety of things at the workweek:

I worked with Sean McArthur on getting his Hyper integration to work on Android (We finally got it to work!). I started out knowing nothing about the Android environment/NDK; now I'm much more comfortable.
There was an awesome session organized by Mike Hoye on helping new volunteer contributors get involved in your project. One of the major reasons I contribute to Mozilla is that they have by far the most welcoming and helpful system for newbies I've seen in any open source project. Mike gave a great introduction to the topic; covering a lot of ground on what we do well, and more importantly what we don't. Josh Matthews and Mike Conley outlined what makes a good mentored bug; and what makes a good mentor. There were various other talks outlining experiences with new contributors and lessons learned (including mine). The success stories by Joel Maher and Margaret Leibovic were particularly inspiring; both of their teams have managed to get a lot of effective community involvement. After the session a couple of us had a fun discussion over lunch on how we should move forward with this and make the newbie experience better. Mozilla may be the best at onboarding new contributors, but there still is a lot of room for improvement.
I was also part of a discussion on Rust's compiler plugins. Rust provides hooks for writing custom syntax extensions and lints that can run arbitrary expansion/analysis on the AST at compile time. However, the API is unstable since it deals with compiler internals. We discussed if it would be possible to somehow get a subset of this working for 1.0 (Rust 1.0 has strict backwards-compatibility requirements). We also discussed the future of Rust's serialization support (currently based on builtin syntax extensions and rather broken). Hopefully future Rust (though not 1.0) will have support for an evolved form of Erick's serde library.
We had a couple of discussions with the Automation team on testing, including performance/power tests. These were quite productive. I also had some interesting side discussions with Joel on performance testing; I'm now planning to see if I can improve performance in Servo by statically checking for things like unnecessary copies via the lint system.
I was part of the discussion in replacing some component of Firefox with one written in Rust. We were quite apprehensive about this discussion; expecting all sorts of crazy requirements — apparently the build team was tempted to tell some horror stories but fortunately held back :P, In the end, it turned out okay — the requirements were quite reasonable and we're rather optimistic about this project.
I also helped Eddy on his crazy project to polyfill the polyfill for web components/Polymer so that it works in Servo.

Besides the major things listed above; there were tons of side discussions that I had on various topics which were rather helpful.

The week ended with a great party (which had a performance by Macklemore & Ryan Lewis!).

Overall, this was quite a fun and enriching experience. I hope I'll be able to participate in such events in the future!

Tuesday, July 22, 2014

200, and more!

After my last post on my running GitHub streak, I've pretty much continued to contribute to the same projects, so I didn't see much of a point of posting about it again — the fun part about these posts is talking about all the new projects I've started or joined. However, this time an arbitrary base-ten milestone comes rather close to another development on the GitHub side which is way more awesome than a streak; hence the post.

Firstly, a screenshot:

I wish there was more dark green

Now, let's have a look at the commit that made the streak reach 200. That's right, it's a merge commit to Servo — something which is created for the collaborator who merges the pull request¹. Which is a great segue into the second half of this post:

I now have commit/collaborator access to Servo. :D

It happened around a week back. Ms2ger needed a reviewer, Lars mentioned he wanted to get me more involved, I said I didn't mind reviewing, and in a few minutes I was reviewing a pull request for the first time. A while later I had push access.

This doesn't change my own workflow while contributing to Servo; since everyone still goes through pull requests and reviews. But it gives a much greater sense of belonging to a project. Which is saying something, since Mozilla projects already give one a sense of being "part of the team" rather early on, with the ability to attend meetings, take part in decision-making, and whatnot.

I also now get to review others' code, which is a rather interesting exercise. I haven't done much reviewing before. Pull requests to my own repos don't count much since they're not too frequent and if there are small issues I tend to just merge and fix. I do give feedback for patches on Firefox (mostly for the ones I mentor or if asked on IRC), but in this situation I'm not saying that the code is worthy to be merged; I'm just pointing out any issues and/or saying "Looks good to me".

With Servo, code I review and mark as OK is ready for merging. Which is a far bigger responsibility. I make mistakes (and style blunders) in my own code, so marking someone else's code as mistake free is a bit intimidating at first. Yes, everyone makes mistakes and yet we have code being reviewed properly, but right now I'm new to all this, so I'm allowed a little uncertainty ;) Hopefully in a few weeks I'll be able to review code without overthinking things too much.

In other GitHub-ish news, a freshman of my department submitted a very useful pull request to one of my repos. This makes me happy for multiple reasons: I have a special fondness for student programmers who are not from CS (not that I don't like CS students), being one myself. Such students face an extra set of challenges of finding a community, learning the hard stuff without a professor, and juggling their hobby with normal coursework (though to be fair for most CS students their hobby programming rarely intersects with coursework either).

Additionally, the culture of improving tools that you use is one that should be spread, and it's great that at least one of the new students is a part of this culture. Finally, it means that people use my code enough to want to add more features to it :)

^{1. I probably won't count this as part of my streak and make more commits later today. Reviewing is hard, but it doesn't exactly take the place of writing actual code so I may not count merge commits as part of my personal commit streak rules.}

Monday, April 21, 2014

The battle against self-xss

In the past few months I've been helping fight a rather interesting attack vector dubbed as "self xss" by the Facebook security team. It's been a rather fun journey.

What is XSS?

XSS, or Cross-site scripting, is a category of attack where the attacker is able to inject JavaScript code onto a page viewed by others. A very simple example would be if Wikipedia allowed the script tag to be used in wikicode. Someone could edit a malicious script onto the page that logs the current user out (or worse).

Most modern XSS vulnerabilities have to do with improper sanitization; though in the past there used to be browser bugs related to XSS.

What is self-XSS?

Self-xss is when the users themselves serve as the attack vector; they willfully copy untrusted code and execute it (via the JavaScript console). This is a social engineering attack which is becoming increasingly common on sites like Facebook. The basic mode of attack is to convince the user that the code does something awesome (e.g. gives access to a hidden feature), and the users do the rest. With social networking sites, the code can even re-share the original post, leading to an exponentially increasing footprint.

There's a nice video explanation of the attack from one of Facebook's engineers here. An example of the attack being used on bank websites is here.

The battle

In May 2011, Chromium landed a fix that strips the javascript: from javascript URLs pasted (or dropped) into the omnibox, and Firefox landed a fix that stopped such URLs from being used from the URL bar at all. This (partially) fixes the attack mentioned in the video, though for Chrome it is possible to ask users to do something convoluted like "type j and then paste" to get it to work. This doesn't make Chrome's solution impotent, however -- more on this later.

After a while, scammers switched to the javascript console for their attacks. This went on for a while.

In July, discussions started on Mozilla on how to fix this for the console. One prominent solution was to use the Content Security Policy (CSP) to let websites ask the browser to disable the console. More on this on a blog post by Joe Walker here.

(CSP lets websites ask the browser to disable some features, like cross-origin script loading. With this the site can greatly hamper XSS and other similar attacks provided that they structure their own code to follow the CSP.)

For a while, discussions went on (tree of relevant bugs, if you're interested), though as far as I can tell nothing concrete was implemented.

In February 2014, Facebook used a modified version of this trick to Chrome's console and enabled this change for a subset of the users. When one opens the console, one is greeted with this message:

From Stack Overflow, by Derek

trying to execute any code would result in the error message at the bottom. Fortunately, the link given there gave developers the ability to turn the console back on. (Netflix later copied this "feature", unfortunately without the opt out)

The loud text is not a bug, Chrome lets one style log messages. But the fact that the website has the power to (absolutely, if they wish) disable the console is a bug; websites should never have that level of power over the browser. I reported it as such soon after. I also noticed a need for a solution to self-xss; this was not the correct solution, but there seemed to be scope for a solution from the browser's side. I noted that in the bug as well.

Once the bug was fixed, the Chrome devtools team recognized that self-xss was something that might be fixed from the browser side, and converted the bug to one for self-xss. They also came up with a brilliant proposal (copied from the comment):

If user is considered a "first-time" user of devtools (a console history of less than 10 entries)
and pastes javascript into an execution context (console/watches/snippets)
Chrome detects that and throws up a confirmation prompt, something like… "You may be a victim of a scam. Executing this code is probably bad for you. [Okay] [I know what I'm doing, continue]." (This part of the proposal was modified to having a prompt which explains the danger and asks the user to type "always allow" if they still wish to continue)

The proposed fix for Chromium

This fix was checked in and later rolled back; they're now considering a universal "Developer Mode" preference that comes with the appropriate warnings. I personally don't really agree with that change; when it comes to such attacks, a specific warning is always better than a general "Only do this if you're a dev" message. People being convinced by a scammer to inject code into their own browsers probably will click through a generic message — after all, they know that they are doing developer-y stuff, even if they don't actually know what they're doing.

On the Firefox side, I filed a bug suggesting a similar change. A while later, Joe wrote another post delving deeper into the issue. The post frames the problem by first modeling self-xss as a "human script execution engine" (the model changes later), and notes that the more complex the "script" is, the less likely the engine is to execute it. There's an interesting analysis as to how the trend probably is, culminating in this graph:

Taken from Joe's blog, with permission
("script" here is the English script used to scam users, not the actual code)

While we can never completely defeat this, some small increases in the necessary complexity for the "script" can go a long way. (This is the reason that Chrome's solution for the omnibox is still effective. It can be bypassed, but it makes the "script" more complex with the "type j and then paste" instructions)

We just have to balance the solutions with the annoyance to devs factor.

Turns out that Chrome's solution (for the console) seems to be pretty balanced. People will be shown a prompt, which they will have to read through to figure out how to enable pasting. They can't just ignore it and press the nearest "ok" button they see, which would have been the case with a normal dialog. For devs, this is a minor annoyance that lasts a few seconds. Additionally, if the dev has already used the console/scratchpad in the past, this won't come up in the first place since it is disabled after 10 entries in the scratchpad/console.

Of course, the scammer could simply ask the victim to type "allow pasting", but this has two issues. Firstly, it's now Firefox-specific, so they lose half their prospective victims. This can be fixed by making two branches of instructions for Chrome and Firefox, but that still increases complexity/suspicion. Secondly, the flow for this is rather strange anyway; most would find it strange that you have to type "allow pasting", and might be curious enough to read the popup to see why. There's also a big friendly "Scam warning" header, which can catch their attention.

I wrote the patch for Firefox, this is what the current UI looks like when you try to paste something into the console or scratchpad:

Firefox's solution, for both the console and scratchpad

This got checked in today, and will probably reach the release channel in a few months.

Hopefully this takes a big bite out of the self-xss problem :)

I've been selected for Google Summer of Code 2014!

I've been selected for GSoC 2014!

My project is to implement XMLHttpRequest in Servo (proposal), under Mozilla (mentored by the awesome Josh Matthews)

Servo is a browser engine written in Rust. The Servo project is basically a research project, where Mozilla is trying to rethink the way browser engines are designed and try to make one which is more memory safe (The usage of Rust is very crucial to this goal), and much more parallel. If you'd like to learn more about it, check out this talk or join #servo on irc.mozilla.org.

What is GSoC?

Google Summer of Code is a program by Google that helps jumpstart involvement in open source amongst students. Organizations are invited to float their own open source projects, and students float project proposals. If selected, you get to code the project over the summer, working closely with the mentor. It's supposed to be a great experience, and I'm glad I'll get to be a part of it this year!

Who else got selected?

One purpose of this post was to recognize all my friends and college-mates who got selected:

Mozillia India friends

Saurabh Anand (sawrubh): FileLinks in IMs / File transfer
Avik Pal: Sound Visualization And Sound Effects In Artikulate
Sukant Garg (gargsms): Add learning capability in the Gaia Keyboard prediction
Pankaj Malhotra (bitgeeky): Functional Test Suite and Features for QA Taskboard - One and Done
Suyash Agarwal (sshagarwal): Thunderbird - Make the unit test framework work with maildir mailbox format
Sunny (darkolwzz): Implement Zest recorder and runner

Other IITB-ians

Alankar Kotwal: Image Pixel Based Photometric Redshift Estimation
Sushant Hiray: Extending Elementary Functions in CSymPy
Navin Chandak: pgmpy : Implementation of Undirected Graphical Models and its algorithms
Aman Mangal: Work Stealing Scheduling on Parallella
Saket Choudhary: Human Genetic Variation Viewer
Kunal Tyagi: Integration of ROS and Gazebo with Tango Controls
Aditya Nambiar: Visualization for Mailing stats and A/B testing
Anand Soni: Improvement of automatic benchmarking system
Praveen Kumar Pendyala: Android based remote display
S K Savant: Multiview Registration
Abhishek Bhowmick: Performance Optimization with VOLK
Roshan Raghupathy: Expand and Improve Boost.Compute
Dushyant Sabharwal: Proposal for Access Control User Interface for SOS Servers
Siddhant Rajagopalan: Mail Blast UI

Friday, April 4, 2014

50 more shades of green

This is a continuation of 50 shades of green. Read that first if you want to know what this is about.

So I finally reached a hundred day GitHub streak!

Since I already rambled about GitHub, commit streaks, time management, and a bunch of other things in the previous post, I'll just use this post to list what new projects I started working on in the latter 50 days:

Servo, Mozilla's new browser* in Rust. Rather fun project to work on, there is a lot of scope for making an impact on the project since a lot of the core features are yet unimplemented. Additionally, it has a tight-knit community.
SE-CitationHelper: A citation helper for Stack Exchange, based on this meta request. This was actually paid work, since I don't have the time to commit to a project like that.
ElectionPortal: A way to quickly hold elections and polls with LDAP authentication and filtering.
MathToTeX: My parser for converting typed math into LaTeX. This already existed, but was a sub-repo elsewhere. I plan to reorganize this repo and then start work on rewriting the algorithm to be more extensible.
Blaze: Under the Charcoal group. This project is for monitoring new posts on a medium-activity Stack Exchange site.
I also forked 2048 to add the ability to save one's present state. This is on a fork and doesn't count in the activity punchcard.

In addition I continued work on most of the repos mentioned in the previous post.

I was less active than the first 50 days, I had academic commitments, extracurriculars, and more recently went to Kolkata and Kharagpur for a Mozilla event. Once the summer starts I expect to be pushing more dark greens :)

I also had to tweak the rules a bit from last time. Since I was working on Servo and rebasing code often, there were days when I did commit code but the commit was eventually moved around or left to rot in a fork. Neither of these count in the activity punchcard -- so for these days only, I've allowed pull requests / readme edits to count. I think there are two days like this.

Let's see how far I can take it from here!

* To be precise, it's a layout engine, not a browser. A layout engine (eg Gecko) handles most of the core magic that makes a browser work -- parsing and displaying HTML, and interacting with JavaScript. Browser features like tabbing and preferences and bookmarks are not a part of Servo, while it can be used (when it's stable enough) to browse the Internet, it's meant to be plugged into a browser if you want these features.

Wednesday, April 2, 2014

Introductory Firefox core development events : Setup issues

I'll be posting something about my overall experience at ProgramIIEST and MozSetup@IITKGP later this week, but I wanted to get this out there first. I may also post something about improving the good-first-bug system.

This post is partially meant as an extension to Deb's post on the same issue. Most of the contents in this post come from discussions with Deb, Sankha, Saurabh, and others: thanks, everyone!

So till now I've participated in two MozSetup-style events. One in IIT Bombay (where i was a participant), and one in IIT Kharagpur (where I was a volunteer/mentor). And one major issue that's there is setup. Basically, getting participants to come with a build system is rather nontrivial, and can be a turn-off in cases. Plus, some participants are on Windows (on the other end of the spectrum, some are on Arch), and it's harder to sort this out. Internet is not something to rely on either.

Besides that, build times are long, especially on systems like mine:

Well, at least the build didn't take 400 minutes :p pic.twitter.com/W6o4wEDWed
— Manish Goregaokar (@ManishEarth) August 24, 2013

At the IITB event, I had spent quite a bit of time getting a build ready. Fortunately I was able to create a couple of patches without testing, but that's certainly not the ideal way to go. Firstly, getting started takes up a huge chunk of time, and it's a bit overwhelming to have the participants learn and understand the entire process. It's far better to get them involved in writing code and let them figure out the details of setting it up at their leisure.

At the Kharagpur event, I had planned on having some lab machines with a full Nightly build on them so that the students could test and make their patches on this system. This might have worked out, but we didn't have time (or lab access) the day before to initialize this. In the end, we had one machine with a full build on it, and another machine that was built later during the event. I had planned to rsync the built objdirs across systems, but somehow that didn't work even though I'd kept everything in a username-agnostic location (/opt). This is something I'll look into later.

But it turns out there's an easier way to do things than to run full builds on the spot. @Debloper had the interesting idea of using OpenStack for this, and after some discussions it was basically to have an OpenStack instance where we create a VM with a full build environment, and allow participants to fork this VM and do all their coding/testing there (via ssh -X). This requires some investment in maintaining an OpenStack instance, but overall it's a viable way to go. We can also allow participants to keep access to the instance for some time period to make transition to development on their own systems much easier.

As an alternative to this, I had the idea of using flash drives instead of VMs. One way to do this is to install a persistent Ubuntu system¹ on a 16 GB flash drive, install the prerequisites, and build. This pen drive can then be booted into and used regardless of the user's system. It's persistent, too, so it can be used in the long term as well. It has the drawback of being a bit slower, though. Also, this drive can be quickly cloned via dd in preparation for an event. If a user wishes to install it baremetal, they can do so manually with dd and update-grub.

The other option is to make an Ubuntu live flash drive, but to customize it via squashfs and chroot and add the required packages along with a full build. Here, there won't be persistent storage, so anyone trying it out by booting into the flash drive will lose their work on reboot. However, this is easier to install baremetal since the standard installation process will work, and a baremetal install is faster, too. Again, the ISOs can be cloned.

If we want this to be scalable, we can eventually ask Mozilla to build these ISOs once every X days (or once every clobber) and put them up for download, much like Nightly builds. As far as I can tell, this won't create much extra strain on their resources. Then event organizers all over the world just have to burn the ISOs to some flash drives the night before, which is something very feasible.

The cherry on top of this method (Deb's awesome idea) is that these flash drives can double as swag. A mozilla-branded drive is something pretty cool, especially if it contains everything you need for contributing to Firefox from wherever you are. The details of this depend on budget and all, but ... it's an option :)

There will still be architecture issues and speed issues, but these can be solved with some work. Using an older Ubuntu version like Backtrack does is one way to make things faster, and we can always have a couple of AMD flash drives ready.

I hope we get to try this method out at a similar event (maybe the upcoming Kolkata one). There are a lot of avenues to explore here, and a lot of room for improvement, but overall it seems like a promising way to fix the setup issues at such events.

1. Or Fedora, but I haven't yet worked out the details for Fedora. I'll be trying this out when I have time.

Saturday, February 15, 2014

50 shades of green

Update: The streak has reached 100, read more about the additional projects I was working on here

Alright, four shades. Making 50 squares.

Yep, that's right, my GitHub commit streak reached 50 days!

What's GitHub? What's a GitHub commit streak?

Meow

For the uninitiated, GitHub is an online service that lets you efficiently manage repositories of code using the git protocol. Besides allowing for easy version control and collaboration on code (which are just features of the git protocol), it provides a bunch of useful collaboration tools like the issue tracker, and nifty features like pull requests. Most code hosted on GitHub is open source.

I keep most of my code on GitHub because

I can access my code from anywhere and make changes
I can use git without having to set up a bare repository on a remote server every time
It's open source, and I don't have to deal with the hassles of keeping it up to date elsewhere
It's pretty easy for others to report issues on it
It's easy for others to submit their own patches to the code via the pull request feature. I can also add collaborators with minimal hassle.

When using Git, a "commit" is basically a bundle of changes to the code, which can later be pulled/pushed between servers. On GitHub, if you've been committing code for a number of days in a row, it's called a "commit streak", and is showed on the profile. Days with relatively more commits are shown as a darker shade of green on the punchcard.

How I got started

Initially I didn't have any intention of maintaining a commit streak. Near the end of December, I was working on both Charcoal and HostelNoticeboard, and after a week and a half of constantly committing code, I noticed that I had a commit streak going. Naturally, I was pretty happy and wanted to extend this.

I first set up some ground rules, inspired by Ryan Seys:

Issues don't count
Edits to READMEs don't count
Edits to non-code files like GitHub Pages files do count.
No scripting commits; and push code the day you write it unless it's half written
No playing with local commit times

I also identified repositories and mini-projects that I needed to work on beforehand. This actually got some of my backburner'ed ideas out of the woodwork; some of which I actually implemented.

The journey

Initially I found it challenging to commit code every day. I had a lot of other commitments (ha!) in life and didn't want to impinge on my academics. Usually it takes a bit of time to get warmed up before coding; one has to evaluate the situation and figure out what needs to be done. This, along with debugging, takes up quite a bit of time.

However, as time passed, I got more and more efficient at this so that I could spend more time writing real code. At the same time, maintaining the commit streak became a habit. I used to always have a terminal tab open for my cloned repositories, and would be hacking away every now and then.

Sticking to an agenda becomes natural after a point

There were some days when I thought that I would be too busy to code, and would instead make some minor changes to fill in the punchcard for that day. Almost every time, I ended up unexpectedly making more substantial contributions the same day. There were also some days when I would open the site, in a panic that I forgot to code that day; and it would turn out that I had committed code, just forgotten about it. I guess that's the first sign of madness, but who cares?

Use GitHub for your academics, too!

As exam time neared, I had to switch strategies. I always had planned to put up my LaTeX documents (notes, presentations, assignments) on GitHub, making it easier for me to share them, keep them up to date, and incorporate improvements. Till then I had been using scripts to upload them to my university homepage. Which wasn't as efficient.

So I created CourseResources, and uploaded all the old documents I could find. Since I would be writing notes or assignments regularly, this provided a steady source of commits (also, a second motivation to study!) that helped me when I was too busy to write proper code. I still tried not to rely on this for the streak, though. The goal is to consolidate as many LaTeX notes as possible here; the repo is under an organization for easy collaboration.

Where it's at now

So, in the past 50 days, the new projects I created are:

Kapi, a Metro note-taking app with fluid math support. Made it for a hackathon, plan to continue working on it.
IIT-Timetable, a webpage that lets one easily construct and share a printable semester timetable without having to worry too much about the complicated slot pattern. While there are plans to extend this, the app is complete in itself.
ChatExchange, a python wrapper for Stack Exchange Chat. Currently it has basic read/write functionality, but needs a lot of polishing. I also created multiple projects that use this as a submodule:

StackExchange-ChatBot: A python class that can be used to easily create a chatbot that can react to various commands. I created this today, and it doesn't do much yet but gives an idea of the basic structure.
SmokeDetector: A bot that monitors the Stack Exchange realtime feed and links to possible spam or otherwise low quality posts in a couple of chatrooms so that it can be dealt with quickly. This was intended to solve the issue of spam lying around on low-activity sites if the moderators aren't around at that moment. The bot is currently running, though I make tweaks to the algorithm every now and then.
ChatExchange-Scripts: A couple of random scripts created as proof-of-concepts.

CourseResources (both the CourseResources and Slides repos): As mentioned before, contains all my LaTeXed documents. Feel free to pull request and add your own.
daemonic-mach, a project to integrate inotify or watchman with Mozilla's mach build program to speed up build time. This is just a placeholder for now, I haven't yet gotten around to starting this. First commits don't count for a streak unless there are subsequent commits, so this didn't add to the streak.
ECMAScript6-tester, a script that loads dummy versions of proposed ES6 features into the document and reports compatibility of the document with these features. Intended to prevent naming collisions (like this one) where a prototype extension clashes with a new feature, breaking things. This repo is another placeholder.

Look at all this code I wrote!

In addition, I worked on the following preexisting projects (not necessarily my projects):

Charcoal, a webapp that lets one easily collect and flag noisy content (mainly comments) from Stack Exchange sites. I mainly dealt with the JS code in this.
HostelNoticeboard, the code (both Pi-side and server-side) for the Electronic Noticeboard project in IIT. I've written the Pi-side code and a portion of the server-side stuff. The code works and is currently deployed, on a single Pi with the online interface here. There are a bunch of improvements on the roadmap that I mean to get to in a few weeks.
waca: The Wikipedia Account Request System, running here. I usually do small bugfixes.
wncc.github.io: The Web & Coding club website, running here. I add posts and sometimes make changes to the Jekyll code.
Manish-Codes, random userscripts and things which I write.

All in all, plenty of code written, lots of work done :D

Where it's going to go

I really don't know how long I'll be able to keep this up. Academics do get in the way, and while I can do minor changes every day, that's not too productive. However, it's giving me a driving motivation to get all my backburner'ed projects finished, which is great! It's also taught me a lot about planning and I got a good chance to hone my coding skills.

These 50 days have been really fun, though, and I hope I'll be able to keep it up as long as possible :)

Hope I get the time!

Octocats taken from the Octodex

Friday, February 7, 2014

Getting started with bug-squashing for Firefox

See also: Tips and Tricks For Fixing Your First Bug by Saurabh Anand

So over the past few months I've been trying to contribute a bit to Mozilla (mainly Firefox). Last August there was a MozBoot session at IIT Bombay which helped me get over the learning curve.

First off, a big thanks to @Debloper (and @hardfire) for showing me the basics. The process is intimidating, though once you've done it with help, it becomes pretty natural. These Mozilla reps got me past that intimidation point, so I'm really grateful for that.

This post is basically an tutorial on how to get started. It's basically an in-depth version of this tutorial, which I feel misses a few things.

Note that I am still a beginner at this, comments on how to improve this post/my workflow appreciated.

Ok, let's get started.

Step 1: Identifying a bug you want to fix

Firstly, make an account on https://bugzilla.mozilla.org/. You'll need it later. Browse the bug lists on the site, looking for bugs that seem fixable. Look for bugs marked as "good first bug", which have a status of "NEW".

Of course, this is a bit cumbersome to do and there are a lot of bugs which are nontrivial or have a lot of discussion baggage which you may not want to go through. Fortunately, there are some tools out there that greatly help in searching for bugs.

Firstly, there's What Can I Do For Mozilla?. This is an interactive questionnaire that helps you find out which portions of Mozilla or Firefox you may be able to comfortably contribute to. Note that this is not just Firefox, though if you select the HTML or JS categories you will be presented with the Firefox subcategory which contains various entries.

This doesn't help find bugs as much as it helps you find the areas of the codebase that you might want to look at.

However, there is a different tool that is built specifically for this purpose; to look for easy bugs given one's preferences and capabilities. It's called Bugs Ahoy, and it lets you tick your preferences and programming languages to filter for bugs. It also has two insanely useful options, one that lets you filter out assigned bugs, and one that tells it to look for "good first bugs" ("simple bugs"). "Good first bug"s on Bugzilla are easy bugs which are kept aside for new users to try their hand at. There is a mentor for these bugs, who is a very active community member or employee. These mentors help you through the rest of the process, from where you need to look in the code to how to put up a patch. I've found that the mentors are very friendly and helpful, and the experience of being mentored on a bug is rather enjoyable.

Make sure the bug isn't assigned to anyone, and look through the comments and attachments for details on the status of the bug. Some bugs are still being discussed, and some bugs are half-written (it's not as easy to use these for your first bug). If you need help on choosing a bug, join #introduction on irc.mozilla.org. There are lots of helpful people out there who can give feedback on your chosen bug, and help you get started.

Step 2: Finding the relevant bits of code

If this is a mentored bug, you usually can ask the mentor in a comment on the bug for help. Be sure to get it assigned to you! If the mentor doesn't respond in a few days, use the needinfo box at the bottom of the page:

Type the username (usually preceded by a colon somewhere in the full name string), and a suggestion box should pop up with various users. Pick your mentor out from the list, and ask for help in the comment box.

If you want to look for the code yourself, Mozilla Cross-Reference is a great tool. For Firefox, you probably want the mozilla-central subtree. With MXR, you can easily search the codebase for text, variable names, and regexes.

For most UI changes, you can track the code down by first looking for nearby strings. For example, if you want to look at the code for the where-do-I-save-downloads preference which is preceded with the text "Save files to", the search result leads to a dtd file, which defines the entity saveTo.label as the string. (Remember, all displayed strings will be in a localization file). Searching for saveTo.label turns up main.xul. Now that you've found this, you can dig deeper by looking at the event handling and figuring out where the relevant javascript is, or you can look around this same file and figure out how it works, depending on what you want to fix.

I've not really made any changes to the C++ yet, only the toolkit and UI javascript, so I can't comment on how one can find the C++ code relevant to a bug. But you can always ask around in IRC or ask your mentor (if any) for help.

Step 3: Downloading and building the code

See also: Simple Firefox build

Not all bugs need a build. Some are quite easy to do without having a full copy of the code or a build, and while you'll eventually want to have both of these, it is possible to hold this off for a while, depending on the bug. While it's easier to create patchfiles when the system is all set up, I will address patching without the full code in the next section.

Downloading can be done in two ways. Both require Mercurial to be installed (sudo apt-get install mercurial works).

One way is to simply hg clone https://hg.mozilla.org/mozilla-central. This will download the full repository. However, if you don't think your internet connection will be stable, download the mozilla-central bundle from here and follow the steps given there. Note that Mercurial is a bit different from Git, so you may wish to read up on the basics.

To build firefox , you first need to setup your build environment. If you already have Mercurial and Python installed on Linux/OSX, the build environment setup is simply ./mach bootstrap, run from the root directory of the cloned repository. For setting it up on Windows or for other corner cases, go here.

Once done, go to the root directory of the firefox code and run ./mach build. After your first build, you can run incremental builds (that only build the files you ask it to, and rebuilds any files depending on it) by using ./mach build <list of filepaths>, eg ./mach build browser/components/preferences/. You can specify both folders and files to the incremental build.

Note that for some javascript files, you have to build their containing directory — so if your changes aren't getting reflected in the incremental build, try building the directory they are in.

Step 4: Getting a patch

See also: How to submit a patch, Mercurial Queues

So by this point you will have figured out the fix and modified the code so that you have a partial (for a multifaceted bug) or full fix of the bug. At this point you can submit the patch for review. For this, you need to have a patch to submit first!

Creating patches with hg

If you have the full cloned repository, first add these lines to your ~/.hgrc to enable the mercurial queues extension with the proper settings.

[ui]
username = Firstname Lastname 

[defaults]
qnew = -Ue

[extensions]
mq =

[diff]
git = 1
unified = 8
showfunc = 1

Once done, navigate to the firefox source tree and run hg qqueue -c somequeuenamehere. This will create a named patch queue that you can work on.

Now, run hg qnew patchname.patch and then hg qpush patchname.patch. This creates a new patch by that name in the .hg/patches-queuename folder, and pushes it onto the curretly applied stack of patches. You can update its contents with the changes made to the code by hg qrefresh or simply hg qref. This patch is the one that you can submit in step 5.

When you run hg qnew, it will ask you to enter a commit message. Write the bug name and a short description of the patch ("Bug 12345 - Frob the baz button when foo happens"), and add a ";r=nameofreviewer". In case of mentored bugs, the uername of the mentor will be your reviewer. If not, you'll have to find a reviewer (more details on this later, for now you may leave this blank and edit it in the patch file later). Note that the default editor for this is usually vim, so you have to press Ins before typing text and then Esc followed by a :x and Enter to save.

Advanced usage

In case of complicated bugs or bugs which already have a patch, you can queue the patches up. Simply use hg qnew to create patches and hg qpush or hg qpop to move up and down the patch queue (this will change the code to reflect the currently active patch, and hg qref will update that same patch)

If you want to work on a different bug in parallel, you just have to pop all current patches out, and create a new patch queue with hg qqueue -c. You can then switch between the queues with hg qqueue queuename.

Creating patches without hg

Since the full repository takes a really long time to download and unpack, it's useful to have a different way of making patches so that the download doesn't become a blocking step.

For preliminary patches, with just one file

This is if you want to submit a patch that can be reviewed fr feedback but not checked in as a final patch. I wouldn't recommend using this method, but I'll keep these instructions here just in case.

If you're just editing one file, put the old version and the new version side by side, and run diff -u oldfile newfile >mypatch.patch in the same directory. Now, open the patch file and edit the paths to match the relative filepath of the edited file from the root firefox directory (eg if you edited main.xulold to main.xul, replace both names with browser/components/preferences/main.xul)

Proper patches

Put the files in a directory, and then run git init on the files. Now, git add * and then git commit -m "commit message" to commit the files.

After this, make your changes to the files. Then, run git diff -U8 >output.patch . Edit the patch and change the a/filename and b/filename lines to be a/path/to/filename and b/path/to/filename. The paths here are relative with respect to the root directory.

Now, add the following to the top of the patch

# HG changeset patch
# Parent parenthash
# User Firstname Lastname email@something.com>
Bug 12345 - Frob the baz button when foo happens; r=jaws

Set the commit message as described in the above section for creating patches with hg.

As for the parent hash, you can ignore and remove the line (or get it by going to the mozilla-central hg repository and copying the hash of the tip commit).

Step 5: Submitting the patch, and the review process

See also: Getting reviews

Now that you're at this stage, the rest is pretty smooth sailing. Find the "add attachment" link on the bugzilla page:

Upload the attachment, give it a descriptive name ("Patch for barring the foo", though sometimes I just use "Patch 0.1"), and make sure the "patch" checkbox is ticked

Now, you also need to ask for review. Click the dropdown next to the review menu, and set it to "?" ("requesting review"). Put the username of your reviewer in the "Requestee" box (and use the autosuggest to get the email address). If you don't know who to ask for review:

If the bug is mentored, your mentor will be able to review your code. Usually the mentor name will turn up in the "suggested reviewers" dropdown box in bold, too.
If the bug isn't mentored, you still might be able to find reviewers in the suggested reviewers dropdown. The dropdown is available for bugs in most firefox and b2g components.
Otherwise, ask around in IRC or check out the hg logs of the file you modified (start here) to find out who would be an appropriate reviewer.
A list of module owners and peers for each module can be found here (the Firefox and Toolkit ones are usually the ones you want). These users are allowed to review code in that module, so make sure you pick from those. If you mistakenly pick someone else, they'll usually be helpful enough to redirect the review to the right person.

Usually, on the first bug, your review will be canceled ("r-"). This is nothing to be worried about, the mentors (and/or reviewers) are very helpful and will let you know exactly what can be improved in the process. This is one of the things I like about Mozilla; everyone's quite helpful!

Once you fix the nits and other changes requested from you, re-upload the attachment (mark the old one as obsolete).

At one point, the review will be granted, and the code will be checked in. Once that happens, the bug will get marked as resolved. And you're done with your first bug!