Reputation: 832
To hone my C skills, I download the eglibc source code and I came across strncpy. I don't see why he distinguished the case where n<=4 and made 4 tests.
int
STRNCMP (const char *s1, const char *s2, size_t n)
{
unsigned char c1 = '\0';
unsigned char c2 = '\0';
if (n >= 4)
{
size_t n4 = n >> 2;
do
{
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0' || c1 != c2)
return c1 - c2;
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0' || c1 != c2)
return c1 - c2;
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0' || c1 != c2)
return c1 - c2;
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0' || c1 != c2)
return c1 - c2;
} while (--n4 > 0);
n &= 3;
}
while (n > 0)
{
c1 = (unsigned char) *s1++;
c2 = (unsigned char) *s2++;
if (c1 == '\0' || c1 != c2)
return c1 - c2;
n--;
}
return c1 - c2;
}
May be it has something to do with the memory layout i don't know, please enlighten me.
Upvotes: 3
Views: 444
Reputation: 385657
It's an unrolled loop. At the cost of making the binary a little larger, it speeds up string comparisons by eliminating 3 decrements, 3 branches and 3 conditionals for each 4 bytes to compare.
The optimizing could even have been taken a step further by using the same technique as Duff's device, though it's not clear this would actually be faster. From the linked page,
This automatic handling of the remainder may not be the best solution on all systems and compilers – in some cases two loops may actually be faster (one loop, unrolled, to do the main copy, and a second loop to handle the remainder). The problem appears to come down to the ability of the compiler to correctly optimize the device; it may also interfere with pipelining and branch prediction on some architectures. When numerous instances of Duff's device were removed from the XFree86 Server in version 4.0, there was an improvement in performance and a noticeable reduction in size of the executable. Therefore, when considering using this code, it may be worth running a few benchmarks to verify that it actually is the fastest code on the target architecture, at the target optimization level, with the target compiler.
Upvotes: 6