@GregHewgill you are right, but we are not speaking about the original hash algorithm colliding (yes, sha1 collides but this is another story). If you have a 10 characters hash you get higher entropy if it is encoded with base64 vs base16 (or hex). With base16 you get 4 bits of information per character, with base64 this figure is 6bits/char. Totaly a 10 char 'hex' hash will have 40bits of entropy while a base64 60bits. So it is slightly more resistant, sorry if I was not super clear.
– Nov 13 '13 at 14:35. You need to hash the contents to come up with a digest. There are many hashes available but 10-characters is pretty small for the result set. Way back, people used CRC-32, which produces a 33-bit hash (basically 4 characters plus one bit). There is also CRC-64 which produces a 65-bit hash. MD5, which produces a 128-bit hash (16 bytes/characters) is considered broken for cryptographic purposes because two messages can be found which have the same hash.
It should go without saying that any time you create a 16-byte digest out of an arbitrary length message you're going to end up with duplicates. The shorter the digest, the greater the risk of collisions. However, your concern that the hash not be similar for two consecutive messages (whether integers or not) should be true with all hashes. Even a single bit change in the original message should produce a vastly different resulting digest.
So, using something like CRC-64 (and base-64'ing the result) should get you in the neighborhood you're looking for. You could use an existing hash algorithm that produces something short, like MD5 (128 bits) or SHA1 (160). Then you can shorten that further by XORing sections of the digest with other sections. This will increase the chance of collisions, but not as bad as simply truncating the digest.
Also, you could include the length of the original data as part of the result to make it more unique. For example, XORing the first half of an MD5 digest with the second half would result in 64 bits. Add 32 bits for the length of the data (or lower if you know that length will always fit into fewer bits). That would result in a 96-bit (12-byte) result that you could then turn into a 24-character hex string.
Alternately, you could use base 64 encoding to make it even shorter. Just summarizing an answer that was helpful to me (noting @erasmospunk's comment about using base-64 encoding). My goal was to have a short string that was mostly unique. Look, it's not really a traditional hash. It has useful properties where the user can declare string size in places where there is extremely limited buffer space on certain OSes (e.g. Mac OSX) AND the result has to fit within the limited domain of real filenames AND they don't want to just truncate the name because that WOULD cause collisions (but shorter strings are left alone).
A cryptographic hash is not always the right answer and std::hash is also not always the right answer. – Nov 24 '16 at 20:04.