1
0
Fork 0
Tool for sorting binary records
Go to file
Joris van Rantwijk a6fe2a199c Add some automated tests 2022-07-03 15:59:00 +02:00
src Use fallocate() instead of posix_fallocate() 2022-07-03 10:40:32 +02:00
tests Add some automated tests 2022-07-03 15:59:00 +02:00
.gitignore Add some automated tests 2022-07-03 15:59:00 +02:00
Makefile Add some automated tests 2022-07-03 15:59:00 +02:00
README.md Fix README 2022-06-25 17:20:43 +02:00

README.md

SortBin: a tool for sorting binary records

SortBin is a tool for sorting arrays of binary data records. It is similar to the Unix sort utility. But where sort works with lines of text, SortBin works with fixed-length binary data records.

SortBin reads input from a file, sorts it, and writes the sorted data to an output file. These files contain flat, raw arrays of binary data records.

Records are interpreted as fixed-length strings of 8-bit unsigned integers. These records are sorted in lexicographic order. This means that records are sorted by their first byte, then records with equal first bytes are sorted by their second bytes, and so on.

SortBin can sort very large data files which do not fit in memory. In such cases, a temporary file is used to store intermediate results. The program starts by separately sorting blocks of data that do fit in memory. It then iteratively merges these blocks into larger sorted blocks until the complete file is sorted.

This program is designed to work with relatively short data records, up to about 20 bytes. Sorting larger records should work, but may be inefficient.

Usage

SortBin has only been tested on Linux.

To use SortBin, you must compile the source code. Clone the repository, then build as follows:

git clone https://github.com/jorisvr/sortbin.git
cd sortbin
make

You can now sort data like this:

build/sortbin --size=10 --memory=2G input.dat output.dat
  • size specifies the record size in bytes
  • memory specifies the amount of RAM that SortBin may use