Even Faster Hash Codes

I wrote a post recently about how disappointed I was that the optimiser couldn’t outsmart some clever Java code for computing hash codes. Well, here’s a faster hash code along the same lines.

The hash code implemented in Arrays.hashCode is a polynomial hash, it applies to any data type with a positional interpretation. It takes the general form $latex \sum_{i=0}^{n}x_{i}31^{n - i}$ where $latex x_0 = 1$. In other words, it’s a dot product of the elements of the array and some powers of 31. Daniel Lemire’s implementation makes it explicit to the optimiser, in a way it won’t otherwise infer, that this operation is data parallel. If it’s really just a dot product it can be made even more obvious at the cost of a loss of flexibility.

Imagine you are processing fixed or limited length strings (VARCHAR(255) or an URL) or coordinates of a space of fixed dimension. Then you could pre-compute the coefficients in an array and write the hash code explicitly as a dot product. Java 9 uses AVX instructions for dot products, so it should be very fast.

public class FixedLengthHashCode {

    private final int[] coefficients;

    public FixedLengthHashCode(int maxLength) {
        this.coefficients = new int[maxLength + 1];
        coefficients[maxLength] = 1;
        for (int i = maxLength - 1; i >= 0; --i) {
            coefficients[i] = 31 * coefficients[i + 1];
        }
    }

    public int hashCode(int[] value) {
        int result = coefficients[0];
        for (int i = 0; i < value.length && i < coefficients.length - 1; ++i) {
            result += coefficients[i + 1] * value[i];
        }
        return result;
    }
}

This is really explicit, unambiguously parallelisable, and the results are remarkable.

Benchmark	Mode	Threads	Samples	Score	Score Error (99.9%)	Unit	Param: size
HashCode.BuiltIn	thrpt	1	10	10.323026	0.223614	ops/us	100
HashCode.BuiltIn	thrpt	1	10	0.959246	0.038900	ops/us	1000
HashCode.BuiltIn	thrpt	1	10	0.096005	0.001836	ops/us	10000
HashCode.FixedLength	thrpt	1	10	20.186800	0.297590	ops/us	100
HashCode.FixedLength	thrpt	1	10	2.314187	0.082867	ops/us	1000
HashCode.FixedLength	thrpt	1	10	0.227090	0.005377	ops/us	10000
HashCode.Unrolled	thrpt	1	10	13.250821	0.752609	ops/us	100
HashCode.Unrolled	thrpt	1	10	1.503368	0.058200	ops/us	1000
HashCode.Unrolled	thrpt	1	10	0.152179	0.003541	ops/us	10000

Modifying the algorithm slightly to support limited variable length arrays degrades performance slightly, but there are seemingly equivalent implementations which do much worse.

public class FixedLengthHashCode {

    private final int[] coefficients;

    public FixedLengthHashCode(int maxLength) {
        this.coefficients = new int[maxLength + 1];
        coefficients[0] = 1;
        for (int i = 1; i <= maxLength; ++i) {
            coefficients[i] = 31 * coefficients[i - 1];
        }
    }

    public int hashCode(int[] value) {
        final int max = value.length;
        int result = coefficients[max];
        for (int i = 0; i < value.length && i < coefficients.length - 1; ++i) {
            result += coefficients[max - i - 1] * value[i];
        }
        return result;
    }
}

Benchmark	Mode	Threads	Samples	Score	Score Error (99.9%)	Unit	Param: size
FixedLength	thrpt	1	10	19.172574	0.742637	ops/us	100
FixedLength	thrpt	1	10	2.233006	0.115285	ops/us	1000
FixedLength	thrpt	1	10	0.227451	0.012231	ops/us	10000

The benchmark code is at github.