GitHub shares personal data of thousands of unsuspecting Firefox users

  • November 21, 2021

The repositories hosted on the GitHub platform are easily accessible and full of Firefox users’ personal data. Why doesn’t the site protect them?

GitHub is a popular software development platform whose audience has long surpassed 40 million users. The site offers free hosting for open-source projects or support for private repositories – places where any data is stored and maintained. These features mean that millions of users have accounts on the site, and many of them are eager to leave their data there. It turns out they don’t all have to be there, and GitHub won’t be particularly responsible for their inattention. Is this the right thing to do?

How does GitHub work?
Anyone with internet access and email can create an account on GitHub. Fulfilling these two requirements is enough for a user to post their authorship code to the portal’s resources. Although it seems like a very convenient solution, it can turn into Russian roulette for novice programmers.

When publishing the code we create, we must remember that only private repositories remain our exclusive property. Everything we add publicly automatically becomes available to every portal user. Thanks to this, programmers can learn from their own and others’ mistakes and actively participate in interesting projects. This splendor of simplicity, however, hides all the shortcomings of social networks, but corrected to the maximum.

Beginning programmers can face destructive criticism, while advanced developers should consider that their work will continue to be used. In addition, writing the code we would like to show the world is only the beginning. The real problem for several thousand active users has been the power of cookies and the way the web browser works.

What custom data can we find in GitHub’s public repositories?
GitHub’s public repositories are easy to search, and many experienced developers do it for fun. About a dozen days ago, Aidan Marlin, a security professional, decided to spend some time in a similar way.

He noticed that by formulating an appropriate query to publicly available GitHub repositories, he could get the result of several thousand responses with cookies stored in the Firefox browser. These are the cookies.sqlite databases generated in the browser session and used by the author when posting.

They accompany the published code because they run from the Linux home directory. In this situation, it is followed by additional unnecessary data, such as temporary web browser files. Although this behavior is related to user errors, there were so many responses for just one browser that Marlin decided to report it to the platform via HakerOne.

In response, GitHub said it wasn’t a portal-side problem and closed the report.

Why is GitHub data not a problem for Firefox?
Mozilla has commented similarly on the problem, but its spokesman’s response should bring some clarity to GitHub. The Firefox maker confirms that the proposed way of making sensitive data public is possible, if not typical. Also, not just for Firefox, but for all web browsers. The only thing he can recommend in this situation is to use Firefox Sync, a version of the browser with extra encryption.

It’s a questionable solution to the problem, but Mozilla quite rightly saw no flaw in its product and diplomatically offered additional precautions. When we use GitHub, the data in the browser is stored just like a transfer at an online bank or an online doctor’s appointment. An extra password or cookie database encryption is not standard because cookies expire quickly anyway. Thus, it is enough that the website we visit does not allow us to make them available. The whole problem is that GitHub allows it.

Why is GitHub leaking data?
The portal has not commented on the situation, but the response to the report of the problem suggests that it doesn’t want to care about the security of its users. And it must be for two reasons. There is no reason to trust its users, and effectively limiting the problem is solely in its purview.

Why isn’t a GitHub user necessarily a developer?
When using GitHub, its users should be able to control what they publish. The problem is that GitHub has no reason to require this skill from its users, the site allows anyone who wants to use all of its resources. It doesn’t test the knowledge of those who create an account.

Naming a specialized portal is not enough for programmers who are always advanced and consciously use the tools available. It requires limiting the site’s accessibility and excluding programming gadgets and novices. GitHub avoids this idea and allows everyone to speak publicly, tacitly recommending that you take out your own insurance.

How does GitHub increase the risk of data sharing?
An experienced developer adventures users who have mistakenly shared their personal data have the right to tease. However, the perpetrators themselves may have no idea that they’ve done anything dangerous. After all, their mistake is just one cookie.sqlite database. After a few days or even hours since it was published, it won’t yield any relevant data to anyone. The problem is that their bug is not one of a million, but a rule of thumb in the public GitHub repository.

Many such databases, posted a minute ago, are like a tastefully decorated gift. Just make a copy, put it in the Firefox Profiles folder, and we’ll be everywhere the data owner deigns to log in when sharing code. You may not always find something interesting, but Github’s public repositories have readily available material in mass quantities. This makes them a great research field for more than just bored programmers. Only GitHub can completely limit the threats that arise.