Gottlieb t Freitag

The incredible speed of memchr

December 10, 2022 - simon

Introduction

On my current project SeqAn3 we deal with a lot of parsing of file formats. These file formats are heavily text-based formats, e.g: fasta, .fastq, .sam, .vcf. A very typical operation we apply is finding a new line in a huge buffer. Think of something like:

char* str = "ACGTACGTACGTACGTACGTACGT\nACGTACGTACGTACGTACGT\n"
// find the position of the next new line

So what is the fastest way of finding these new lines?

Options

The functions at hand vary. The first two contestants are memchr and rawmemchr. These functions are part of the standard C library.

Another possibility is to write naive handwritten loops and see if the compiler can optimize these. These functions will are named mymemchr and myrawmemchr.

The next two candidates are std::string::find and std::string_view::find. These can been seen as the C++ equivalents to the C function.

At last, we also check how well std::find over a std::string is performing, just to make sure that there is no speed difference.

Results

The exact code can be viewed on godbolt. It uses the benchmark library from google. The program is recompiled with different lengths of string. Everything is compiled with g++ -march=native -O3 -DNDEBUG and ran on a 5900HS Ryzen processor.

-------------------------------------------------------------
length of string          22            148           1324
-------------------------------------------------------------
_memchr                 2.41 ns        3.17 ns        8.76 ns
_rawmemchr              2.38 ns        3.55 ns        7.58 ns
_mymemchr               6.49 ns       38.00 ns      294.00 ns
_myrawmemchr            4.98 ns       43.50 ns      343.00 ns
_string_find            2.39 ns        3.10 ns        9.11 ns
_string_view_find       2.19 ns        2.85 ns        8.61 ns
_find                   3.93 ns       21.00 ns      187.00 ns

We can see that memchr, rawmemchr, string::find and string_view::find are in a similar level of performance. The differences are probably just inaccuracies of the measurement. But the slowness of mymemchr and myrawmemchr is clear, showing that looping over by buffers for parsing is a lot slower than turning to tools like std::string_view::find. Very unexpected is the slowness of std::find. For some reason, it is still faster than my hand-rolled functions, but is very far away from the speed of the most optimized routines.

My takeaway as a C++ programmer is, using std::string::find and std::string_view::find is what I want in probably almost all cases. There is no benefit to C functions. Also avoid using handwritten loops, which are easily created when using ranges/views. For small strings we see improvements of 2-3x, for mid-long strings, it is around 10x, and for long strings, we see over 30x speed improvements.

Appendix

Code

Link to godbolt or here if godbolt ever looses the code:

#include <benchmark/benchmark.h>

#include <cstring>
#include <string>
#include <iostream>

void* mymemchr(char* s, char c, int i) {
    while(i--) {
        if (*s++  == c) {
            return s-1;
        }
    }
    return nullptr;
}

void* myrawmemchr(char* s, char c) {
    while(true) {
        if (*s++  == c) {
            return s-1;
        }
    }
    return nullptr;
}
//static std::string _x = "hellouiaeuiaeuiaeuiae$";
//static std::string _x = "hellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiahellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaee$";
static std::string _x = "hhehhellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiahellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiahellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeeellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiahellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeehhellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiahellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiahellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeeellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiahellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiahellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiahellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeeellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiahellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaee$";


struct A {
    A() {
        std::cout << "length of string: " << _x.size() << "\n";
    }
} a;


static void _memchr(benchmark::State& state) {
  // Code before the loop is not measured
  for (auto _ : state) {
    benchmark::DoNotOptimize(memchr(_x.data(), '$', _x.size()));
  }
}
BENCHMARK(_memchr);

static void _rawmemchr(benchmark::State& state) {
  // Code before the loop is not measured
  for (auto _ : state) {
    benchmark::DoNotOptimize(rawmemchr(_x.data(), '$'));
  }
}
BENCHMARK(_rawmemchr);

static void _mymemchr(benchmark::State& state) {
  // Code before the loop is not measured
  for (auto _ : state) {
    benchmark::DoNotOptimize(mymemchr(_x.data(), '$', _x.size()));
  }
}
BENCHMARK(_mymemchr);

static void _myrawmemchr(benchmark::State& state) {
  // Code before the loop is not measured
  for (auto _ : state) {
    benchmark::DoNotOptimize(myrawmemchr(_x.data(), '$'));
  }
}
BENCHMARK(_myrawmemchr);

static void _string_find(benchmark::State& state) {
  // Code before the loop is not measured
  for (auto _ : state) {
    benchmark::DoNotOptimize(_x.find('$'));
  }
}
BENCHMARK(_string_find);


static void _string_view_find(benchmark::State& state) {
  // Code before the loop is not measured
  std::string_view sv = _x;
  for (auto _ : state) {
    benchmark::DoNotOptimize(sv.find('$'));
  }
}
BENCHMARK(_string_view_find);

static void _find(benchmark::State& state) {
  // Code before the loop is not measured
  for (auto _ : state) {
    benchmark::DoNotOptimize(std::find(begin(_x), end(_x), '$'));
  }
}
BENCHMARK(_find);


BENCHMARK_MAIN();