The incredible speed of memchr
December 10, 2022 - simon
Introduction
On my current project SeqAn3 we deal with a lot of parsing of file formats. These file formats are heavily text-based formats, e.g: fasta, .fastq, .sam, .vcf. A very typical operation we apply is finding a new line in a huge buffer. Think of something like:
char* str = "ACGTACGTACGTACGTACGTACGT\nACGTACGTACGTACGTACGT\n"
// find the position of the next new line
So what is the fastest way of finding these new lines?
Options
The functions at hand vary. The first two contestants are memchr
and rawmemchr
. These functions are part of the standard C library.
Another possibility is to write naive handwritten loops and see if the compiler can optimize these. These functions will are named mymemchr
and myrawmemchr
.
The next two candidates are std::string::find
and std::string_view::find
. These can been seen as the C++ equivalents to the C function.
At last, we also check how well std::find
over a std::string is performing, just to make sure that there is no speed difference.
Results
The exact code can be viewed on godbolt. It uses the benchmark
library from google.
The program is recompiled with different lengths of string.
Everything is compiled with g++ -march=native -O3 -DNDEBUG
and ran on a 5900HS Ryzen processor.
-------------------------------------------------------------
length of string 22 148 1324
-------------------------------------------------------------
_memchr 2.41 ns 3.17 ns 8.76 ns
_rawmemchr 2.38 ns 3.55 ns 7.58 ns
_mymemchr 6.49 ns 38.00 ns 294.00 ns
_myrawmemchr 4.98 ns 43.50 ns 343.00 ns
_string_find 2.39 ns 3.10 ns 9.11 ns
_string_view_find 2.19 ns 2.85 ns 8.61 ns
_find 3.93 ns 21.00 ns 187.00 ns
We can see that memchr
, rawmemchr
, string::find
and string_view::find
are in a similar level of performance. The differences are probably just inaccuracies of the measurement.
But the slowness of mymemchr
and myrawmemchr
is clear, showing that looping over by buffers for parsing is a lot slower than turning to tools like std::string_view::find
.
Very unexpected is the slowness of std::find
. For some reason, it is still faster than my hand-rolled functions, but is very far away from the speed of the most optimized routines.
My takeaway as a C++ programmer is, using std::string::find
and std::string_view::find
is what I want in probably almost all cases. There is no
benefit to C functions. Also avoid using handwritten loops, which are easily created when using ranges/views.
For small strings we see improvements of 2-3x, for mid-long strings, it is around 10x, and for long strings, we see over 30x speed improvements.
Appendix
Code
Link to godbolt or here if godbolt ever looses the code:
#include <benchmark/benchmark.h>
#include <cstring>
#include <string>
#include <iostream>
void* mymemchr(char* s, char c, int i) {
while(i--) {
if (*s++ == c) {
return s-1;
}
}
return nullptr;
}
void* myrawmemchr(char* s, char c) {
while(true) {
if (*s++ == c) {
return s-1;
}
}
return nullptr;
}
//static std::string _x = "hellouiaeuiaeuiaeuiae$";
//static std::string _x = "hellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiahellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaee$";
static std::string _x = "hhehhellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiahellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiahellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeeellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiahellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeehhellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiahellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiahellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeeellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiahellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiahellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiahellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeeellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiahellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaehellouiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaeuiaee$";
struct A {
A() {
std::cout << "length of string: " << _x.size() << "\n";
}
} a;
static void _memchr(benchmark::State& state) {
// Code before the loop is not measured
for (auto _ : state) {
benchmark::DoNotOptimize(memchr(_x.data(), '$', _x.size()));
}
}
BENCHMARK(_memchr);
static void _rawmemchr(benchmark::State& state) {
// Code before the loop is not measured
for (auto _ : state) {
benchmark::DoNotOptimize(rawmemchr(_x.data(), '$'));
}
}
BENCHMARK(_rawmemchr);
static void _mymemchr(benchmark::State& state) {
// Code before the loop is not measured
for (auto _ : state) {
benchmark::DoNotOptimize(mymemchr(_x.data(), '$', _x.size()));
}
}
BENCHMARK(_mymemchr);
static void _myrawmemchr(benchmark::State& state) {
// Code before the loop is not measured
for (auto _ : state) {
benchmark::DoNotOptimize(myrawmemchr(_x.data(), '$'));
}
}
BENCHMARK(_myrawmemchr);
static void _string_find(benchmark::State& state) {
// Code before the loop is not measured
for (auto _ : state) {
benchmark::DoNotOptimize(_x.find('$'));
}
}
BENCHMARK(_string_find);
static void _string_view_find(benchmark::State& state) {
// Code before the loop is not measured
std::string_view sv = _x;
for (auto _ : state) {
benchmark::DoNotOptimize(sv.find('$'));
}
}
BENCHMARK(_string_view_find);
static void _find(benchmark::State& state) {
// Code before the loop is not measured
for (auto _ : state) {
benchmark::DoNotOptimize(std::find(begin(_x), end(_x), '$'));
}
}
BENCHMARK(_find);
BENCHMARK_MAIN();