With all the recent password leaks and other security-related blunders, I decided to talk about password hashing and explain it so non-computer savvy people can understand. Hopefully people will gain a better understanding about storing passwords and how to create good ones.

Plain Text

You might have heard the term “plain text” when hearing about passwords or one of the recent leaks (or the mega-Sony leak of last year). This is sometimes referred to as clear text. So what does this mean? It simply means the password is stored so any person can read it. Like so:

Passw0rd

The dangers of this should be pretty obvious: anyone who has access to the password knows what it is, and considering that in a database an user name — or even worse, an email address — is usually accompanying the password, the dangers are even more paramount (especially if you’re one of those people who reuse passwords. Hint: don’t do that). And if a database is leaked… game over. So how do we prevent someone who has access to the password from knowing what it is? This where hashing comes in.

Hashing

Hashing is a way of obfuscating text. It can be any text, but it’s commonly used for passwords. It’s a one-way algorithm that literally transforms the password into a fixed-length string of random letters and numbers (in hexadecimal, meaning each “digit” goes from 0 to 15, which is represented by 0 to 9, and a to f). As just mentioned, this algorithm is one-way, so there’s no way to reverse the hash. Passwords at the bare minimum should be stored as hashed values in the database and the plain text versions should never be stored under any circumstances.

There are many different hashing algorithms. Some are better than others. The two most common ones are MD5 and SHA-1. Neither of these algorithms are considered secure anymore but they are still widely used. Each hashing algorithm is one-way and always produces a fixed-length string of characters, known as a string. MD5 outputs a 32 character string and SHA-1 outputs a 40 character string.

Let’s take a look at the MD5 output for Passw0rd, the example used above:

d41e98d1eafa6d6011d3a70f1a5b92f0

As one can see, there’s no real way to tell what password this is just by looking at it. It is important to note that each algorithm always produces the same output for the same input. In other words, every time one enters Passw0rd, the result will always be what’s above. This is how websites and other applications are able to authenticate users. If the user enters in an incorrect password, the hashed output will be different than what’s in the database and therefore will not be authenticated.

Another important thing to note is that input that is sequential or closely related will result in a drastically different output. Here’s a quick example using SHA-1:

Passw0rd is ebfc7910077770c8340f63cd2dca2ac1f120444f in SHA-1
passw0rd is 7c6a61c68ef8b9b6b061b28c348bc1ed7921cb53 in SHA-1

As you can see, changing just the first letter from uppercase to lowercase has a drastic effect on the output, and the two do not look similar at all. This helps make the algorithms stronger.

Predictable Output

While the output isn’t predictable to humans, it certainly is to computers. In an effort to “crack” hashed passwords people have created “dictionaries” and what’s known as rainbow tables to cover as many hash possibilities as possible. These dictionaries can number in the billions, and you can bet that the common passwords people choose on a regular basis are in there. There are rainbow tables that cover all combinations of upper and lowercase letters, numbers, and symbols up to eight characters for the MD5 algorithm. This is why it’s so important not to use some simple password like J@son123.

This suddenly doesn’t sound very secure. How do we mitigate the effects of precomputed hash dictionaries and rainbow tables? We use what’s known as a salt.

Salting

Salting is adding an additional string to the password, and then hashing the new string. Salts are usually appended to the front or end of the password. There are a few caveats about salts, though. The same salt should never be used more than once and should always be unique, changing every time the user changes his password. The salt should also be long and random, probably the same length as the hash algorithm’s output. The longer and more random, the stronger it is. I personally like to use a combination of PHP’s mt_rand() function (without arguments), which generates a random number between 0 and 2147483647, and the current date and time, formatted in a specific way, accurate to the second.

Adding a salt has two major strengths: 1) it makes the password much harder to guess, and a good salt would render hash dictionaries and rainbow tables useless, forcing an attacker to use brute-force techniques which could take centuries or longer, and 2) if two users are using the same password, the hashes that are stored in the database will look completely different. Since it’s expected that a lot of common passwords are reused, many unsalted hashes in a large database would be the same. An attacker would only have to crack it once, though, for all of the same password to be revealed. Adding a salt makes this impossible.

Simply put, adding a salt is just adding a string that’s so long, so ridiculously random, that it’s improbable to appear in any hash dictionary or rainbow table. Take a look at the following two passwords. Which do you think is more likely to appear in someone’s list?

Passw0rd
d32c00e3c7935e76da471babeda400c902903ee0Passw0rd

And what if three people in the database are all using the password Passw0rd?

The SHA-1 hash values of the following passwords

4e4940aa6f9df8148aafa6ed458d583091b6c162Passw0rd
79564deffc86af62e08f90d5cc432880357f5773Passw0rd
b5c4f7b8919242ed4316e302deb50748eb235812Passw0rd

Result in:

2a22aeaf4e0a6f10f475effa1256feff3b39b328
23da0509aaebb5b0e5911f9e1a5fb624f3bc6b0a
7e130f3cbb36cb86cd3caef24c554605c363012f

It’s not obvious that all three of those users are using the same password. Salting is essential to password security and every website should use it. Salts are usually stored in the database, and while this may seem detrimental, keep in mind that for an attacker to break the hash, they would have to apply the salt to every single entry in their dictionary, which would take an enormous amount of time — per password! Also don’t forget that the attacker has no way of knowing (unless they are able to get a hold of the application’s source code, which has happened before) whether the salt is appended to the front, end, or something else entirely before being hashed. Even if they did know, the amount of time required to break even a single password becomes infeasible.

Conclusion

I hope password hashing and storage has been explained simply enough for everyone to understand, and I hope after reading this post, you reevaluate your passwords. One other thing to keep in mind: hashing algorithms can handle any kind of string input, and do not require limitations; websites that impose arbitrary restrictions on your passwords should set off red flags because it’s an indicator that they’re storing the passwords in plain text. Think twice about the passwords you use for these websites and services.