Passwords in logs: why, what and how?

A couple of major companies, a couple of days, and a couple of very similar security incidents. This week, both Github and Twitter disclosed incidents involving internal logging systems capturing plaintext (not encrypted or hashed) passwords. In this post I’ll break down why this happens, what it means, and how it was likely discovered.

Firstly, a disclaimer. I have no insider knowledge about these two incidents. I’m making assumptions throughout based on very similar incidents I’ve seen throughout my career in Information Security, and particularly, incident response. Also, I think both Github and Twitter did exceptionally well, and acted honorably by disclosing these incidents to their users and offering the chance to remediate.

Why?

Before addressing why this happens, there is another ‘why’ that must be discussed — why does it matter? This question has been raised a couple of times to me since both incidents. And it’s usually phrased something like this, “well, surely Github/Twitter knows my Github/Twitter password anyway, that’s how they know it’s me, what’s the problem?”

Actually, they aren’t supposed to know your password. No one is, except for you. Sites, including Twitter and Github use a hashing mechanism and a random piece of data called a ‘salt’ to generate a string of characters, called a hash, that doesn’t resemble your password at all. When you login, the hash is generated again, and compared to the stored one. If they match, you’re in. Hence, Twitter and Github know you’ve entered the correct password, but they don’t know what it is. This is how it should be, and a great way to tell if a site is engaged in poor security practices to use the ‘forgotten password’ function. If you get an email containing your old, forgotten password, something is up. They shouldn’t know it to be able to send it.

Ok, so why?

As for why this happens, the answer, somewhat ironically, is partly because security teams have become victims of their own success. One of our favourite sayings in Information Security is “log all the things!” We love to log every request, every database transaction, every login and logoff event, and we should be doing this. It means we can use those logs to detect security incidents in real time, or revisit them later. If we didn’t have them, we’d be blind to what had occurred. It’s not just information security teams who like logs, operations teams use them to know when things aren’t working as they should, and developers use them to debug issues.

But what we, in security, really mean when we say “log all the things!” is, “log all the things, except the secret things!” There are certain things that should never be logged. Passwords clearly fall into this category, but it also includes other secret things like encryption keys and access tokens. In our desire to log all the things, frequently, we inadvertently capture more than we’d bargained for.

Logging has also become more widespread due to the reduced costs of storage for said logs, and the rapid growth of cloud-based logging service providers.

But didn’t you just say the passwords aren’t known to the sites in question? So, how can they log them anyway?

You’re right, however the plain, unscrambled version of your password has to make it from your fingertips to the site in question so they can hash it to make the comparison. From the moment you hit return to login to a site, your password is sent as part of an HTTP request (wrapped in a technology called TLS, to protect it along the way) to a server listening on the other side. That server will be briefly aware of the plaintext version of your password in its memory, before passing it to the hashing function. It’s at this point that a logging tool could grab the whole request and stash it. Or a network performance, or web application analytics tool could capture it and store indirectly in its logs. If this happens, you’ve inadvertently created a repository of all the secrets you’ve worked so diligently to protect.

What’s the risk then?

Let’s consider a couple of risks, risks to the end user (aka. you and me) and risks to the site operator.

If you’re in Github, or Twitter’s shoes, you’re really going to want to scrub those log files containing passwords as soon as possible. If there is a future breach-like event (and to be clear, there has been no evidence of such a breach in both of these cases) and those logs are targeted, well, the net impact is just as bad as if you were storing passwords in plaintext in your main database, which would be extremely reckless and damaging.

For you and me, as users of those services, the risks are a lot less significant than if our passwords had been exposed to an audience outside of Twitter and Github employees. Our risk level is further reduced when we use multi-factor authentication to secure our accounts, and further still when we ensure that we use a unique password on every site we use, as the loss of one doesn’t constitute the loss of all.

Still, as previously mentioned, the rule is clear, ‘only you should know your password’, so while it’s unlikely that anything will come of this, go change it, and go change the passwords on any other sites that share your Twitter or Github password.

Both of these services offer multi-factor authentication, so that should be enabled too. Multi-factor authentication is designed for precisely this situation, where one piece of authentication data has become compromised.

How come both in the same week?

It’s an interesting question, how did both Github and Twitter discover the same type of incident in the same timeframe? While I don’t know the definitive answer, I can hazard a guess based on what is currently going on in the realm of information security and compliance — GDPR. Those four letters represent the ‘General Data Protection Regulation’, an EU privacy law that is set to take effect on May 25th. GDPR provides EU residents with a selection of rights in regards to their privacy of their data.

If you hold data on EU resident folks, you have to comply with GDPR, because if you don’t, the regulation promises sizable fines.

Both Twitter and Github of course hold information about their users who reside in the EU. Both likely have sizable budgets for security and compliance, and therefore it’s not beyond the realms of possibility that they both were conducting an internal GDPR readiness audit when these findings occurred. That’s where I’d place my money anyway.

Of course, could just be the Github incident inspired someone at Twitter to check something, and they found a problem.

In any case, the issues have been found, remediated and disclosed in good time. I think the fact that both of these companies responded in the way they did is a very positive sign that security incidents, even those that occur internally, are being handled with increased levels of diligence and seriousness.

Information security professional specializing in SecOps, IR and Digital Forensics. Author of the Digital Forensic Diaries, and now, the Pen Test Diaries.