Reputation: 1151
I have the following implementation of strstr
NOTE: This code is not mine.
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
char *fast_strstr(const char *haystack, const char *needle)
{
if (!*needle) // Empty needle.
return (char *) haystack;
const char needle_first = *needle;
// Runs strchr() on the first section of the haystack as it has a lower
// algorithmic complexity for discarding the first non-matching characters.
haystack = strchr(haystack, needle_first);
if (!haystack) // First character of needle is not in the haystack.
return NULL;
// First characters of haystack and needle are the same now. Both are
// guaranteed to be at least one character long.
// Now computes the sum of the first needle_len characters of haystack
// minus the sum of characters values of needle.
const char *i_haystack = haystack + 1,
*i_needle = needle + 1;
unsigned int sums_diff = *haystack;
bool identical = true;
while (*i_haystack && *i_needle)
{
sums_diff += *i_haystack;
sums_diff -= *i_needle;
identical &= *i_haystack++ == *i_needle++;
}
// i_haystack now references the (needle_len + 1)-th character.
if (*i_needle) // haystack is smaller than needle.
return NULL;
else if (identical)
return (char *) haystack;
size_t needle_len = i_needle - needle;
size_t needle_len_1 = needle_len - 1;
// Loops for the remaining of the haystack, updating the sum iteratively.
const char *sub_start;
for (sub_start = haystack; *i_haystack; i_haystack++)
{
sums_diff -= *sub_start++;
sums_diff += *i_haystack;
// Since the sum of the characters is already known to be equal at that
// point, it is enough to check just needle_len-1 characters for
// equality.
if (
sums_diff == 0
&& needle_first == *sub_start // Avoids some calls to memcmp.
&& memcmp(sub_start, needle, needle_len_1) == 0
)
return (char *) sub_start;
}
return NULL;
}
int main(void)
{
char s[] = "this is a test";
char s2[] = "test";
if(fast_strstr(s, s2) != NULL)
puts("YES!");
else
puts("NOT!");
return 0;
}
This gives an incorrect output with the current entry, where is NOT! instead YES!. This problem occurs only with the last word but the strange thing is that it works it with other strings, any idea why this happens?
Upvotes: 1
Views: 650
Reputation: 6251
sum_diff
should be initialized to 0, since they matched at the first character there is no initial difference.
If you run this in GDB you will find sum_diff = 116
(which is the ASCII value of 't'
.) when it returns instead of 0.
unsigned int sums_diff = 0; // *haystack - *needle (which is 0)
This bug causes it to fail for any haystack where the first character of needle appears earlier in the string and doesn't match completely as this is when you go into the for
loop and rely on sum_diff
.
Upvotes: 0
Reputation: 153338
Code fails if first char
of needle matches a char
in the haystack, but the rest does not.
Try fast_strstr("zhis is a test", "test")
Instead of the last return NULL;
, code needs to try the rest of the haystack after the first matching letter. A recursive solution follows, but certainly a loop within the function could be had.
return fast_strstr(haystack+1, needle); // --> YES!
// return NULL;
Code may be fast on some input, but appears to be O(n*n)
Upvotes: 2