Benchmarking a simple integer compression algorithm

In my last post, I described an integer compression algorithm (Simple, from now on) that can be easily implemented in just a few lines of code. I also wrote about its possible strengths and weaknesses, but just as an exercise of pure intuition. In this post, I am going to show you the results I have got by testing and benchmarking the algorithm.

I have written a quick prototype in Go to check and challenge my assumptions about the algorithm. It comprises of a library which implements it, some benchmarks to measure the compression and decompression speed and a command-line tool to test the compression ratio.

I chose Go for no particular reason, apart from the fact that I am familiar with the language and because it is quite easy to create tests and benchmarks with it.

Both the benchmarks and the command-line tool do not just run Simple in isolation, but comparing it against other existing compression techniques. In particular, both execute as well a general-purpose compression library (Zlib through compress/zlib package) and a couple of integer compression algorithms: FastPFor and Bit Packing, which are included in this library. This two last methods are also expanded with delta encoding. So the final list of compression algorithms is: Simple, Zlib, FastPFor, BP32, DeltaFastPFor and DeltaBP32.

Uniformly distributed random values has been employed as input for the compression algorithms. To generate them, I have used the math/rand package, which provides plenty of functions that yield this type of random numbers (as it is explained here). Once generated, the values were also sorted in ascending order, resulting in a sorted sequence of 32-bit random integers.

I have tried to feed the algorithms with the same exact data, but it has been impossible since I have had to deal with different types of input. For instance, Zlib works with []byte, whereas the other integer compression methods expect either []int32 or []uint32. So, I ended up having to make some transformations in order to get a set of inputs as similar as possible.

Results

Environment

The following results have been got on a laptop Ubuntu Desktop 18.10 with a Core i7-6700HQ CPU @ 2.60GHz x 8. You might get different values on a different environment setup, but they should be similar in terms of comparing them to one another.

Compression and decompression speed

Here is the outcome for the benchmarks:

BenchmarkCompressSimple-8            	      10	 105658901 ns/op
BenchmarkCompressZlib-8              	       5	 335552325 ns/op
BenchmarkCompressBP32-8              	     100	  19725261 ns/op
BenchmarkCompressFastpfor-8          	      20	  77703391 ns/op
BenchmarkCompressDeltaBP32-8         	      50	  24439807 ns/op
BenchmarkCompressDeltaFastpfor-8     	      10	 119863099 ns/op
BenchmarkDecompressSimple-8          	      10	 104063865 ns/op
BenchmarkDecompressZlib-8            	       2	 729746805 ns/op
BenchmarkDecompressBP32-8            	     100	  13691662 ns/op
BenchmarkDecompressFastpfor-8        	     100	  14427686 ns/op
BenchmarkDecompressDeltaBP32-8       	     100	  14145599 ns/op
BenchmarkDecompressDeltaFastpfor-8   	      50	  27173598 ns/op

In the compression side, Simple is clearly slower than BP32, FastPFor and DeltaBP32. However, it is slightly faster than DeltaFastPFor and 3x faster than Zlib.

In terms of decompression speed, Simple is much worse than the other specialised integer compression methods. Nonetheless, compared to Zlib, it is 7x faster.

Compression ratio

As I mentioned earlier, I have created a small command-line tool, called compare-compression-ratio, as part of the prototype. Its purpose is to run the set of algorithms with different inputs of different sizes, and then to show the compression ratio for each case. It displays the results in the form of a table, so that it is quite easy to determine at a glance which methods get the best compression ratio.

Running the program for the list of sizes 10,100,1000,10000,100000,1000000 will give you the following output:

      integers     10    100   1000   10000   100000   1000000
         bytes   1.00   1.00   1.00    1.00     1.00      1.00
        simple   0.95   1.05   1.07    1.07     1.07      1.07
          zlib   0.75   0.96   1.00    1.01     1.08      1.17
          bp32   0.77   0.81   1.02    1.06     1.06      1.06
    delta bp32   0.83   1.01   1.32    1.57     1.87      2.32
      fastpfor   0.77   0.81   1.02    1.06     1.06      1.06
delta fastpfor   0.83   1.01   1.33    1.60     1.92      2.39

The first two lines are informative. One is telling the actual input size. The other is there to establish a baseline for comparison: the output of uncompressed data.

As you can see, the compression ratios for the Simple algorithm are somewhat better than BP32 and FastPFor. Yet far from the results of DeltaBP32, DeltaFastPFor and even Zlib, for almost all input sizes, except for the smallest ones.

The case for small data

When I was carrying out the tests, I soon realised that there is a use case in which Simple outperforms the rest of methods: small data. Small data refers here to a short sequence of just a few values, something around 100 integers or fewer. At that scale, Simple is better in terms of compression ratio.

Let’s run compare-compression-ratio with small data:

         integers      5     10     20     30     40     50     80    100    150
            bytes   1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00
           simple   0.83   0.95   1.00   1.02   1.02   1.05   1.06   1.05   1.05
             zlib   0.62   0.74   0.85   0.88   0.91   0.93   0.95   0.96   0.97
             bp32   0.62   0.77   0.77   0.79   0.78   0.81   0.82   0.81   0.99
       delta bp32   0.71   0.83   0.95   0.97   0.98   1.00   1.00   1.01   1.17
         fastpfor   0.62   0.77   0.77   0.79   0.78   0.81   0.82   0.81   0.96
   delta fastpfor   0.71   0.83   0.95   0.97   0.98   1.00   1.00   1.01   1.14

Since the compression ratio is around 1.0 and the overall gain is poor (just a few bytes at most), it is probably more useful to look at the plain output size instead. compare-compression-ratio lets you display the results in that way. This is how it looks like when running the program for the same sizes:

      integers    5   10    20    30    40    50    80   100   150
         bytes   20   40    80   120   160   200   320   400   600
        simple   24   42    80   118   157   191   303   381   569
          zlib   32   54    94   136   176   216   336   416   616
          bp32   32   52   104   152   204   248   388   492   604
    delta bp32   28   48    84   124   164   200   320   396   512
      fastpfor   32   52   104   152   204   248   388   492   628
delta fastpfor   28   48    84   124   164   200   320   396   528

With just a handful of values, Simple achieves some compression, i.e. the output size is less than the input size. For example, we save 2 bytes with 30 integers, 9 bytes with 50, 17 bytes with 80, and so on. For any of the other methods, you need to move to an input size of 100 integers to see a tiny gain.

However, if you look at the results of the first columns on the Simple row, you will still see output sizes greater than their input sizes. The good news here are that we can do better.

By default, compare-compression-ratio adds a 32-bit header to indicate the length of the compressed sequence. Those 32 bits seem excessive to me for such a short sequence. If we cut down the header to 8 bits, Simple can still compress a list of up to 255 values, which is more than enough for our definition of small data. Here is the resulting table when using a 8-bit header:

         integers    5   10    20    30    40    50    80   100   150
            bytes   20   40    80   120   160   200   320   400   600
           simple   21   39    77   115   154   188   300   378   566
             zlib   32   54    94   136   176   216   336   416   616
             bp32   32   52   104   152   204   248   388   492   604
       delta bp32   28   48    84   124   164   200   320   396   512
         fastpfor   32   52   104   152   204   248   388   492   628
   delta fastpfor   28   48    84   124   164   200   320   396   528

At some point between input sizes of 5 and 10, we start compressing. It is not bad at all.

Conclusion

In general, Simple is slower than FastPFor, BP32 and their delta versions. This may be mainly because those algorithms are optimised to be fast and Simple is just a prototype. That doesn’t mean that Simple will beat them eventually, but rather that there is still a lot of room for improvement.

With regards to the compression level, Simple mostly doesn’t get close to the results obtained by Zlib, DeltaBP32 and DeltaFastPFor. The compression ratio flattens when reaching input sizes of around 1000. However, it gets amazing results for small data, as we have covered in the previous point.

I started writing this post (and the previous one) with the sole goal of exploring a pretty elementary technique. There was no expectations beyond some thoughts about how it would perform. Simple has turned out to be a great integer compression method, especially for small inputs. So overall, I am very satisfied with what I have found.