String.hashCode() is defined as:
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}
For a string A of length N (a0, a1,..., aN-2, aN-1),this can be written in a closed form as: 31^(N-1)a0 + 31^(N-2)a1 + ... + 31aN-2 + aN-1.
Pad by zeros
The first observation is that 0 is a valid character and prepending arbitrary number of 0 characters doesn't change the value of the hash. The following code takes advantage of this: public static String padByZeroes(String original, int numberOfExtraZeroes) {
int length = original.length();
char[] chars = new char[length + numberOfExtraZeroes];
for (int i = 0; i < length; i++) {
chars[i + numberOfExtraZeroes] = original.charAt(i);
}
return new String(chars);
}
padByZeroes produces strings of length greater then the original with identical hashcode.Pair Perturbation
As a starting point to this approach we take a string B which is identical to A: (b0, b1,..., bN-2, bN-1). We then modify two consecutive character of B without affecting it's hashcode.If the modified characters are are at index K and K+1 we have: hashCode(A)-hashCode(B) = ... = 31(aK - bK)+(aK+1 - bK+1).
We want the last expression to be zero, that is: 31(aK - bK) = bK+1 - aK+1. As we are free to choose bK and bK+1, to make things simple (and reduce the chance of overflowing/underflowing the two byte value of char) we choose bK = aK - 1 which gives us bK+1 = aK+1 + 31.
public static String pairPerturbation(String original, int pairIndex) {
int length = original.length();
if (pairIndex > length - 2) {
throw new IllegalArgumentException("pairIndex can't be greater then " + (length - 2));
}
char[] chars = original.toCharArray();
chars[pairIndex] = (char) (original.charAt(pairIndex) - 1);
chars[pairIndex + 1] = (char) (original.charAt(pairIndex + 1) + 31);
return new String(chars);
}
Here is a link to the Gist of the code and tests: https://gist.github.com/sorokod/a6848e6cd4c9fe32a523#file-stringhashcollisions
This seems to be the trigger for changes made to Maps in Java 8 as described in http://openjdk.java.net/jeps/180
ReplyDelete