# Sorting Unsigned Integers Faster in Java

I discovered a curious resource for audio-visualising sort algorithms, which is exciting for two reasons. The first is that I finally feel like I understand Alexander Scriabin: he was not a composer. He discovered Tim Sort 80 years before Tim Peters and called it Black Mass. (If you aren’t familiar with the piece, fast-forward to 1:40 to hear the congruence.)

The second reason was that I noticed Radix Sort (LSD). While it was an affront to my senses, it used a mere 800 array accesses and *no comparisons*! I was unaware of this algorithm so delved deeper and implemented it for integers, and benchmarked my code against `Arrays.sort`

.

#### Radix Sort

It is taken as given by many (myself included, or am I just projecting my thoughts on to others?) that $latex O(n \log n)$ is the best you can do in a sort algorithm. But this is actually only true for sort algorithms which depend on comparison. If you can afford to restrict the data types your sort algorithm supports to types with a positional interpretation (`java.util`

can’t because it needs to be ubiquitous and maintainable), you can get away with a linear time algorithm.

Radix sort, along with the closely related counting sort, does not use comparisons. Instead, the data is interpreted as a fixed length string of symbols. For each position, the cumulative histogram of symbols is computed to calculate sort indices. While the data needs to be scanned several times, the algorithm scales linearly and the overhead of the multiple scans is amortised for large arrays.

As you can see on Wikipedia, there are two kinds of radix sort: *Least Significant Digit* and *Most Significant Digit*. This dichotomy relates to the order the (representational) string of symbols is traversed in. I implemented and benchmarked the LSD version for integers.

#### Implementation

The implementation interprets an integer as the concatenation of n bit string symbols of fixed size size 32/n. It performs n passes over the array, starting with the least significant bits, which it modifies in place. For each pass the data is scanned three times, in order to:

- Compute the cumulative histogram over the symbols in their natural sort order.
- Copy the value with symbol k to the mth position in a buffer, where m is defined by the cumulative density of k.
- Copy the buffer back into the original array.

The implementation, which won’t work unless the chunks are proper divisors of 32, is below. The bonus (or caveat) is that it automatically supports unsigned integers. The code could be modified slightly to work with signed integers at a performance cost.

```
import java.util.Arrays;
public class RadixSort {
private final int radix;
public RadixSort() {
this(Byte.SIZE);
}
public RadixSort(int radix) {
assert 32 % radix== 0;
this.radix= radix;
}
public void sort(int[] data) {
int[] histogram = new int[(1 << radix) + 1];
int shift = 0;
int mask = (1 << radix) - 1;
int[] copy = new int[data.length];
while (shift < Integer.SIZE) {
Arrays.fill(histogram, 0);
for (int i = 0; i < data.length; ++i) {
++histogram[((data[i] & mask) >> shift) + 1];
}
for (int i = 0; i < 1 << radix; ++i) {
histogram[i + 1] += histogram[i];
}
for (int i = 0; i < data.length; ++i) {
copy[histogram[(data[i] & mask) >> shift]++] = data[i];
}
for (int i = 0; i < data.length; ++i) {
data[i] = copy[i];
}
shift += radix;
mask <<= radix;
}
}
}
```

The time complexity is obviously linear, a temporary buffer *is* allocated, but in comparison to `Arrays.sort`

it looks fairly spartan. Instinctively, cache locality looks fairly poor because the second inner loop of the three jumps all over the place. Will this implementation beat `Arrays.sort`

(for integers)?

#### Benchmark

The algorithm is measured using arrays of random positive integers, for which both algorithms are equivalent, from a range of sizes. This isn’t always the best idea (the Tim Sort algorithm comes into its own on nearly sorted data), so take the result below with a pinch of salt. Care must be taken to copy the array in the benchmark since both algorithms are in-place.

```
public void launchBenchmark(String... jvmArgs) throws Exception {
Options opt = new OptionsBuilder()
.include(this.getClass().getName() + ".*")
.mode(Mode.SampleTime)
.mode(Mode.Throughput)
.timeUnit(TimeUnit.MILLISECONDS)
.measurementTime(TimeValue.seconds(10))
.warmupIterations(10)
.measurementIterations(10)
.forks(1)
.shouldFailOnError(true)
.shouldDoGC(true)
.jvmArgs(jvmArgs)
.resultFormat(ResultFormatType.CSV)
.build();
new Runner(opt).run();
}
@Benchmark
public void Arrays_Sort(Data data, Blackhole bh) {
int[] array = Arrays.copyOf(data.data, data.size);
Arrays.sort(array);
bh.consume(array);
}
@Benchmark
public void Radix_Sort(Data data, Blackhole bh) {
int[] array = Arrays.copyOf(data.data, data.size);
data.radixSort.sort(array);
bh.consume(array);
}
@State(Scope.Thread)
public static class Data {
@Param({"100", "1000", "10000", "100000", "1000000"})
int size;
int[] data;
RadixSort radixSort = new RadixSort();
@Setup(Level.Trial)
public void init() {
data = createArray(size);
}
}
public static int[] createArray(int size) {
int[] array = new int[size];
Random random = new Random(0);
for (int i = 0; i < size; ++i) {
array[i] = Math.abs(random.nextInt());
}
return array;
}
```

Benchmark | Mode | Threads | Samples | Score | Score Error (99.9%) | Unit | Param: size |
---|---|---|---|---|---|---|---|

Arrays_Sort | thrpt | 1 | 10 | 1304.687189 | 147.380334 | ops/ms | 100 |

Arrays_Sort | thrpt | 1 | 10 | 78.518664 | 9.339994 | ops/ms | 1000 |

Arrays_Sort | thrpt | 1 | 10 | 1.700208 | 0.091836 | ops/ms | 10000 |

Arrays_Sort | thrpt | 1 | 10 | 0.133835 | 0.007146 | ops/ms | 100000 |

Arrays_Sort | thrpt | 1 | 10 | 0.010560 | 0.000409 | ops/ms | 1000000 |

Radix_Sort | thrpt | 1 | 10 | 404.807772 | 24.930898 | ops/ms | 100 |

Radix_Sort | thrpt | 1 | 10 | 51.787409 | 4.881181 | ops/ms | 1000 |

Radix_Sort | thrpt | 1 | 10 | 6.065590 | 0.576709 | ops/ms | 10000 |

Radix_Sort | thrpt | 1 | 10 | 0.620338 | 0.068776 | ops/ms | 100000 |

Radix_Sort | thrpt | 1 | 10 | 0.043098 | 0.004481 | ops/ms | 1000000 |

Arrays_Sort | sample | 1 | 3088586 | 0.000902 | 0.000018 | ms/op | 100 |

Arrays_Sort·p0.00 | sample | 1 | 1 | 0.000394 | ms/op | 100 | |

Arrays_Sort·p0.50 | sample | 1 | 1 | 0.000790 | ms/op | 100 | |

Arrays_Sort·p0.90 | sample | 1 | 1 | 0.000791 | ms/op | 100 | |

Arrays_Sort·p0.95 | sample | 1 | 1 | 0.001186 | ms/op | 100 | |

Arrays_Sort·p0.99 | sample | 1 | 1 | 0.001974 | ms/op | 100 | |

Arrays_Sort·p0.999 | sample | 1 | 1 | 0.020128 | ms/op | 100 | |

Arrays_Sort·p0.9999 | sample | 1 | 1 | 0.084096 | ms/op | 100 | |

Arrays_Sort·p1.00 | sample | 1 | 1 | 4.096000 | ms/op | 100 | |

Arrays_Sort | sample | 1 | 2127794 | 0.011876 | 0.000037 | ms/op | 1000 |

Arrays_Sort·p0.00 | sample | 1 | 1 | 0.007896 | ms/op | 1000 | |

Arrays_Sort·p0.50 | sample | 1 | 1 | 0.009872 | ms/op | 1000 | |

Arrays_Sort·p0.90 | sample | 1 | 1 | 0.015408 | ms/op | 1000 | |

Arrays_Sort·p0.95 | sample | 1 | 1 | 0.024096 | ms/op | 1000 | |

Arrays_Sort·p0.99 | sample | 1 | 1 | 0.033920 | ms/op | 1000 | |

Arrays_Sort·p0.999 | sample | 1 | 1 | 0.061568 | ms/op | 1000 | |

Arrays_Sort·p0.9999 | sample | 1 | 1 | 0.894976 | ms/op | 1000 | |

Arrays_Sort·p1.00 | sample | 1 | 1 | 4.448256 | ms/op | 1000 | |

Arrays_Sort | sample | 1 | 168991 | 0.591169 | 0.001671 | ms/op | 10000 |

Arrays_Sort·p0.00 | sample | 1 | 1 | 0.483840 | ms/op | 10000 | |

Arrays_Sort·p0.50 | sample | 1 | 1 | 0.563200 | ms/op | 10000 | |

Arrays_Sort·p0.90 | sample | 1 | 1 | 0.707584 | ms/op | 10000 | |

Arrays_Sort·p0.95 | sample | 1 | 1 | 0.766976 | ms/op | 10000 | |

Arrays_Sort·p0.99 | sample | 1 | 1 | 0.942080 | ms/op | 10000 | |

Arrays_Sort·p0.999 | sample | 1 | 1 | 2.058273 | ms/op | 10000 | |

Arrays_Sort·p0.9999 | sample | 1 | 1 | 7.526102 | ms/op | 10000 | |

Arrays_Sort·p1.00 | sample | 1 | 1 | 46.333952 | ms/op | 10000 | |

Arrays_Sort | sample | 1 | 13027 | 7.670135 | 0.021512 | ms/op | 100000 |

Arrays_Sort·p0.00 | sample | 1 | 1 | 6.356992 | ms/op | 100000 | |

Arrays_Sort·p0.50 | sample | 1 | 1 | 7.634944 | ms/op | 100000 | |

Arrays_Sort·p0.90 | sample | 1 | 1 | 8.454144 | ms/op | 100000 | |

Arrays_Sort·p0.95 | sample | 1 | 1 | 8.742502 | ms/op | 100000 | |

Arrays_Sort·p0.99 | sample | 1 | 1 | 9.666560 | ms/op | 100000 | |

Arrays_Sort·p0.999 | sample | 1 | 1 | 12.916883 | ms/op | 100000 | |

Arrays_Sort·p0.9999 | sample | 1 | 1 | 28.037900 | ms/op | 100000 | |

Arrays_Sort·p1.00 | sample | 1 | 1 | 28.573696 | ms/op | 100000 | |

Arrays_Sort | sample | 1 | 1042 | 96.278673 | 0.603645 | ms/op | 1000000 |

Arrays_Sort·p0.00 | sample | 1 | 1 | 86.114304 | ms/op | 1000000 | |

Arrays_Sort·p0.50 | sample | 1 | 1 | 94.896128 | ms/op | 1000000 | |

Arrays_Sort·p0.90 | sample | 1 | 1 | 104.293990 | ms/op | 1000000 | |

Arrays_Sort·p0.95 | sample | 1 | 1 | 106.430464 | ms/op | 1000000 | |

Arrays_Sort·p0.99 | sample | 1 | 1 | 111.223767 | ms/op | 1000000 | |

Arrays_Sort·p0.999 | sample | 1 | 1 | 134.172770 | ms/op | 1000000 | |

Arrays_Sort·p0.9999 | sample | 1 | 1 | 134.742016 | ms/op | 1000000 | |

Arrays_Sort·p1.00 | sample | 1 | 1 | 134.742016 | ms/op | 1000000 | |

Radix_Sort | sample | 1 | 2240042 | 0.002941 | 0.000033 | ms/op | 100 |

Radix_Sort·p0.00 | sample | 1 | 1 | 0.001578 | ms/op | 100 | |

Radix_Sort·p0.50 | sample | 1 | 1 | 0.002368 | ms/op | 100 | |

Radix_Sort·p0.90 | sample | 1 | 1 | 0.003556 | ms/op | 100 | |

Radix_Sort·p0.95 | sample | 1 | 1 | 0.004344 | ms/op | 100 | |

Radix_Sort·p0.99 | sample | 1 | 1 | 0.011056 | ms/op | 100 | |

Radix_Sort·p0.999 | sample | 1 | 1 | 0.027232 | ms/op | 100 | |

Radix_Sort·p0.9999 | sample | 1 | 1 | 0.731127 | ms/op | 100 | |

Radix_Sort·p1.00 | sample | 1 | 1 | 5.660672 | ms/op | 100 | |

Radix_Sort | sample | 1 | 2695825 | 0.018553 | 0.000038 | ms/op | 1000 |

Radix_Sort·p0.00 | sample | 1 | 1 | 0.013424 | ms/op | 1000 | |

Radix_Sort·p0.50 | sample | 1 | 1 | 0.016576 | ms/op | 1000 | |

Radix_Sort·p0.90 | sample | 1 | 1 | 0.025280 | ms/op | 1000 | |

Radix_Sort·p0.95 | sample | 1 | 1 | 0.031200 | ms/op | 1000 | |

Radix_Sort·p0.99 | sample | 1 | 1 | 0.050944 | ms/op | 1000 | |

Radix_Sort·p0.999 | sample | 1 | 1 | 0.082944 | ms/op | 1000 | |

Radix_Sort·p0.9999 | sample | 1 | 1 | 0.830295 | ms/op | 1000 | |

Radix_Sort·p1.00 | sample | 1 | 1 | 6.660096 | ms/op | 1000 | |

Radix_Sort | sample | 1 | 685589 | 0.145695 | 0.000234 | ms/op | 10000 |

Radix_Sort·p0.00 | sample | 1 | 1 | 0.112512 | ms/op | 10000 | |

Radix_Sort·p0.50 | sample | 1 | 1 | 0.128000 | ms/op | 10000 | |

Radix_Sort·p0.90 | sample | 1 | 1 | 0.196608 | ms/op | 10000 | |

Radix_Sort·p0.95 | sample | 1 | 1 | 0.225792 | ms/op | 10000 | |

Radix_Sort·p0.99 | sample | 1 | 1 | 0.309248 | ms/op | 10000 | |

Radix_Sort·p0.999 | sample | 1 | 1 | 0.805888 | ms/op | 10000 | |

Radix_Sort·p0.9999 | sample | 1 | 1 | 1.818141 | ms/op | 10000 | |

Radix_Sort·p1.00 | sample | 1 | 1 | 14.401536 | ms/op | 10000 | |

Radix_Sort | sample | 1 | 60843 | 1.641961 | 0.005783 | ms/op | 100000 |

Radix_Sort·p0.00 | sample | 1 | 1 | 1.251328 | ms/op | 100000 | |

Radix_Sort·p0.50 | sample | 1 | 1 | 1.542144 | ms/op | 100000 | |

Radix_Sort·p0.90 | sample | 1 | 1 | 2.002944 | ms/op | 100000 | |

Radix_Sort·p0.95 | sample | 1 | 1 | 2.375680 | ms/op | 100000 | |

Radix_Sort·p0.99 | sample | 1 | 1 | 3.447030 | ms/op | 100000 | |

Radix_Sort·p0.999 | sample | 1 | 1 | 5.719294 | ms/op | 100000 | |

Radix_Sort·p0.9999 | sample | 1 | 1 | 8.724165 | ms/op | 100000 | |

Radix_Sort·p1.00 | sample | 1 | 1 | 13.074432 | ms/op | 100000 | |

Radix_Sort | sample | 1 | 4846 | 20.640787 | 0.260926 | ms/op | 1000000 |

Radix_Sort·p0.00 | sample | 1 | 1 | 14.893056 | ms/op | 1000000 | |

Radix_Sort·p0.50 | sample | 1 | 1 | 18.743296 | ms/op | 1000000 | |

Radix_Sort·p0.90 | sample | 1 | 1 | 26.673152 | ms/op | 1000000 | |

Radix_Sort·p0.95 | sample | 1 | 1 | 30.724915 | ms/op | 1000000 | |

Radix_Sort·p0.99 | sample | 1 | 1 | 40.470446 | ms/op | 1000000 | |

Radix_Sort·p0.999 | sample | 1 | 1 | 63.016600 | ms/op | 1000000 | |

Radix_Sort·p0.9999 | sample | 1 | 1 | 136.052736 | ms/op | 1000000 | |

Radix_Sort·p1.00 | sample | 1 | 1 | 136.052736 | ms/op | 1000000 |

The table tells an interesting story. `Arrays.sort`

is vastly superior for small arrays (the arrays most people have), but for large arrays the custom implementation comes into its own. Interestingly, this is consistent with the computer science. If you need to sort large arrays of (unsigned) integers and care about performance, think about implementing radix sort.