Cartesius00
Cartesius00

Reputation: 24384

Alignment and performance

Routines strcmp for comparing char * and memcmp for everything else, do they run faster on memory block (on x86_64) which is somehow aligned (how?)? Does libc use SSE for this routines?

Upvotes: 5

Views: 201

Answers (2)

Coren
Coren

Reputation: 5637

If you worry about performance for comparison, you should take a look at well-known Boyer-Moore alogrithm and this post from GNU Grep author, Mike Haertel.

He explains how one can manage to be really fast about searching something in a data block.

His summary is quite clear about what to do :

  • Use Boyer-Moore (and unroll its inner loop a few times).
  • Roll your own unbuffered input using raw system calls. Avoid copying the input bytes before searching them. (Do, however, use buffered output. The normal grep scenario is that the amount of output is small compared to the amount of input, so the overhead of output buffer copying is small, while savings due to avoiding many small unbuffered writes can be large.)
  • Don't look for newlines in the input until after you've found a match.
  • Try to set things up (page-aligned buffers, page-sized read chunks, optionally use mmap) so the kernel can ALSO avoid copying the bytes.

Upvotes: 0

Pete Kirkham
Pete Kirkham

Reputation: 49321

It depends, but on architectures where alignment matters or where SIMD instructions are available, typically the routines will operate on leading bytes, then do as many wide aligned operations as the data allows, then operate on trailing bytes.

Whether the leading and trailing bytes are contributing significantly to the processing time for your data can be determined by experiment.

Upvotes: 5

Related Questions