Hashes, Hashes, We All Fall Down
I’ve always relied on hashes to identify threats across our network. We recently hired a new security engineer who insists relying on hashes isn’t enough. Who’s right here?
Chapped Hash in Helsinki
In short, hashes are a (usually) unique, fixed-length string which represents any piece of data.
Let’s think of unique identifiers in our normal lives. Automobiles, for instance, are assigned unique identifying numbers, or VINs. No matter what happens to that vehicle, the VIN is its own, in permanence.
Hashing is to a file what a VIN is to an automobile. Mostly. They are similar, but not the same! We’ll get into the differences in a minute. (Hint: there’s a clue in the very first sentence of my response!)
How are Hashes Generated
Hashes are the output of an algorithm like MD5 (Message Digest 5) or one of the SHA (Secure Hashing Algorithm) options. Since every file on a computer is essentially binary data, a hashing algorithm can process it and output a fixed-length string, sort of like VIN. The result is the file’s hash value or message digest.
To calculate a file’s hash in Windows 10, one can use PowerShell’s built-in “Get-FileHash cmdlet” and feed it the path to a file whose hash value you want to produce. By default, it will use the SHA-2 256 algorithm. Below is an example of how to produce a hash for a file called “Sonatype_logo_full_color.png”.
How Are Hashes Used?
Simply put, hashes allow us to quickly assess the contents of a file without having to inspect the file itself, so long as (and this is important!), our algorithm does not produce the same hash value for two different files. This presentation of duplicate hash values is called a “collision” – which isn’t a good thing when we’re looking to identify whether two entities are the same. This is why having a unique identifier is important.
The odds of producing colliding hash values are small, but not unheard of, which is why more secure hashing algorithms like SHA-2 are replacing SHA-1 and MD5.
At Sonatype, when the evaluation of a component is performed, a one-way hash of the component is created. We also refer to this hash as a fingerprint. That fingerprint is then compared back to the Nexus Intelligence catalog, which provides customers with all the available information on that component. This information might include usage statistics, security vulnerability information, and license information.
Security researchers can use the unique hash of a potentially malicious file on one machine and quickly determine how far its tentacles reach across an entire network. Using hash values, researchers can reference malware samples and share them with others through malware repositories like VirusTotal, VirusBay, Malpedia, and MalShare.
So to summarize, you’re both right. Hashes aren’t completely without value. They can be used in threat hunting to quickly determine the identity of a file, so long as you are using hashing algorithms that entirely avoid collisions. However, solely relying on hashes in the seek-and-destroy world of security is a flawed approach as 1) collisions are possible and 2) two different malicious files would also have two different hashes.
In the end, perhaps it’s best to think of hashing and hashes as a tool, or a part of a more comprehensive approach to security, as opposed to a solution in-and-of itself.
Can’t get enough of Sloan? Subscribe below, and let us know what topics you’d like Sloan to write about next in the comments below!
~ Making Cyber a Safer Space