Fast lossless compression of numbers stored as text
If large amounts of data have to be read and written with maximum speed and good acompression at the same time, an algorithm tailored to the problem is recommended (see e.g. fc16). For some applications, however, it is advantageous if the data can be read by text-based programs, like grep, sed or awk.
fc4 (fast compression 4 bit)
Since numbers coded as text occupy on average slightly more than twice the memory space than numbers stored as binary data, large amounts of data should be compressed before writing them to the hard disk. The program fc4 from 256.systems compresses files, which mainly contain numbers in ASCII format, with over 4 GB/s per core and decompresses them with 8 GB/s per core (on a notebook with Intel i5 processor). Similar compression rates are achieved as with gzip, lzma and zstd.
Benchmark 1: Compression and decompression speeds of different programs
Benchmark 2: Compression rate and overall performance of different programs
Fast conversion from text to numbers and back
In order to save numbers from memory as text, they have to be converted from binary format to decimal format and then to ASCII format. Under Linux there are different printf functions in the standard library.
To convert numbers from text files back to binary format, there are atoi (ASCII to 32 bit integer), atol (ASCII to 64 bit integer) and atof (ASCII to 64 bit float). Although these functions are highly optimized, in many cases they can become the bottleneck of the program.
256.systems provides the functions fatoi, fatof, print_int and soon print_float, which are 4 to 6 times faster than the standard library functions (see benchmark results below).
Benchmark: Conversion of numbers: text --> binary; C Standard Library vs. fc4
Benchmark: Conversion of numbers: binary --> text; C Standard Library vs. fc4
Contact
For questions about compressing numbers stored as ASCII text or fast conversion routines and similar algorithms, 256.systems can be contacted.