A cryptographic hash function (CHF), Hashing Algorithm or simply a hash function is a mathematical algorithm that condenses
data of any length (often called the "message") to a bit array of a fixed-length (the "hash value", "hash", or "message digest").
It is a unidirectional process, that is, a function which is practically infeasible to invert and get back the original data..
A good hash algorithm should be complex enough such that it does not produce the same hash value from two different inputs. If it does,
this is known as a hash collision. A hash algorithm can only be considered good and acceptable if it can offer a very low chance of collision.
A good hashing algorithm would exhibit a property called the avalanche effect, where the resulting hash output would change significantly
or entirely even when a single bit or byte of data within a file is changed. A hash function that does not do this is considered to have poor
randomization, which would be easy to break by hackers.
Ideally, the only way to find a message that produces a given hash is to attempt a brute-force search of possible inputs to see if
they produce a match, or use a rainbow table of matched hashes. Cryptographic hash functions are a basic tool of modern cryptography.
For example,
“The quick brown fox jumps over the lazy dog”
and once we run a specific hashing algorithm, let’s say SHA256 we get the below result:
“d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592 “
This above result is known as a hash or Message digest.
Since, it’s also known as one-way encryption, Its almost impossible to revert back the hash value to
“The quick brown fox jumps over the lazy dog”,
though it’s not always true.
Again, the avalanche effect says, Even the smallest change to data will produce a different hash value. Lets make a little change to our data above to
“Tha quick brown fox jumps over the lazy dog”.
We have substituted an ‘e’ for an ‘a’ , the new hash output will be –
“0a4cbc8b6ca6e2b6230145547bc6512ce83352d6689a233561a54eb0c9c387cc”
.
And you can see that both are of fixed length.
1.1 The ideal cryptographic hash function has the following main properties:
• it is deterministic, meaning that the same message always results in the same hash
• it is quick to compute the hash value for any given message.
• it is infeasible to generate a message that yields a given hash value
(i.e. to reverse the process that generated the given hash value).
• it is infeasible to find two different messages with the same hash value.
• a small change to a message should change the hash value so extensively that
the new hash value appears uncorrelated with the old hash value (avalanche effect).
1.2 What are the benefits of Hashing?
One main use of hashing is to compare two files for equality. Without opening two document files to compare them word-for-word, the calculated hash values of these files will allow the owner to know immediately if they are different.
Hashing is also used to verify the integrity of a file after it has been transferred from one place to another, typically in a file backup program like SyncBack. To ensure the transferred file is not corrupted, a user can compare the hash value of both files. If they are the same, then the transferred file is an identical copy.
In some situations, an encrypted file may be designed to never change the file size nor the last modification date and time (for example, virtual drive container files). In such cases, it would be impossible to tell at a glance if two similar files are different or not, but the hash values would easily tell these files apart if they are different.