String.hashCode() is defined as:
public int hashCode() { int h = hash; if (h == 0 && value.length > 0) { char val[] = value; for (int i = 0; i < value.length; i++) { h = 31 * h + val[i]; } hash = h; } return h; }For a string A of length N (a0, a1,..., aN-2, aN-1),
this can be written in a closed form as: 31^(N-1)a0 + 31^(N-2)a1 + ... + 31aN-2 + aN-1.
Pad by zeros
The first observation is that 0 is a valid character and prepending arbitrary number of 0 characters doesn't change the value of the hash. The following code takes advantage of this:public static String padByZeroes(String original, int numberOfExtraZeroes) { int length = original.length(); char[] chars = new char[length + numberOfExtraZeroes]; for (int i = 0; i < length; i++) { chars[i + numberOfExtraZeroes] = original.charAt(i); } return new String(chars); }padByZeroes produces strings of length greater then the original with identical hashcode.
Pair Perturbation
As a starting point to this approach we take a string B which is identical to A: (b0, b1,..., bN-2, bN-1). We then modify two consecutive character of B without affecting it's hashcode.If the modified characters are are at index K and K+1 we have: hashCode(A)-hashCode(B) = ... = 31(aK - bK)+(aK+1 - bK+1).
We want the last expression to be zero, that is: 31(aK - bK) = bK+1 - aK+1. As we are free to choose bK and bK+1, to make things simple (and reduce the chance of overflowing/underflowing the two byte value of char) we choose bK = aK - 1 which gives us bK+1 = aK+1 + 31.
public static String pairPerturbation(String original, int pairIndex) { int length = original.length(); if (pairIndex > length - 2) { throw new IllegalArgumentException("pairIndex can't be greater then " + (length - 2)); } char[] chars = original.toCharArray(); chars[pairIndex] = (char) (original.charAt(pairIndex) - 1); chars[pairIndex + 1] = (char) (original.charAt(pairIndex + 1) + 31); return new String(chars); }Here is a link to the Gist of the code and tests: https://gist.github.com/sorokod/a6848e6cd4c9fe32a523#file-stringhashcollisions
This seems to be the trigger for changes made to Maps in Java 8 as described in http://openjdk.java.net/jeps/180
ReplyDelete