Kevin
Kevin

Reputation: 1151

Implementation of strstr fails with the last word

I have the following implementation of strstr

NOTE: This code is not mine.

#include <stdio.h>
#include <string.h>
#include <stdbool.h>

char *fast_strstr(const char *haystack, const char *needle)
{
    if (!*needle) // Empty needle.
        return (char *) haystack;

    const char    needle_first  = *needle;

    // Runs strchr() on the first section of the haystack as it has a lower
    // algorithmic complexity for discarding the first non-matching characters.
    haystack = strchr(haystack, needle_first);
    if (!haystack) // First character of needle is not in the haystack.
        return NULL;

    // First characters of haystack and needle are the same now. Both are
    // guaranteed to be at least one character long.
    // Now computes the sum of the first needle_len characters of haystack
    // minus the sum of characters values of needle.

    const char   *i_haystack = haystack + 1,
                  *i_needle   = needle   + 1;

    unsigned int  sums_diff = *haystack;
    bool          identical = true;

    while (*i_haystack && *i_needle)
    {
        sums_diff += *i_haystack;
        sums_diff -= *i_needle;
        identical &= *i_haystack++ == *i_needle++;
    }

    // i_haystack now references the (needle_len + 1)-th character.

    if (*i_needle) // haystack is smaller than needle.
        return NULL;
    else if (identical)
        return (char *) haystack;

    size_t needle_len = i_needle - needle;
    size_t needle_len_1 = needle_len - 1;

    // Loops for the remaining of the haystack, updating the sum iteratively.
    const char *sub_start;
    for (sub_start = haystack; *i_haystack; i_haystack++)
    {
        sums_diff -= *sub_start++;
        sums_diff += *i_haystack;

        // Since the sum of the characters is already known to be equal at that
        // point, it is enough to check just needle_len-1 characters for
        // equality.
        if (
            sums_diff == 0
            && needle_first == *sub_start // Avoids some calls to memcmp.
            && memcmp(sub_start, needle, needle_len_1) == 0
        )
            return (char *) sub_start;
    }

    return NULL;
}

int main(void)
{
    char s[] = "this is a test";
    char s2[] = "test";

    if(fast_strstr(s, s2) != NULL)
        puts("YES!");
    else
        puts("NOT!");

    return 0;
}

This gives an incorrect output with the current entry, where is NOT! instead YES!. This problem occurs only with the last word but the strange thing is that it works it with other strings, any idea why this happens?

Upvotes: 1

Views: 650

Answers (2)

ryanpattison
ryanpattison

Reputation: 6251

sum_diff should be initialized to 0, since they matched at the first character there is no initial difference. If you run this in GDB you will find sum_diff = 116 (which is the ASCII value of 't'.) when it returns instead of 0.

 unsigned int  sums_diff = 0; // *haystack - *needle (which is 0)

This bug causes it to fail for any haystack where the first character of needle appears earlier in the string and doesn't match completely as this is when you go into the for loop and rely on sum_diff.

Upvotes: 0

chux
chux

Reputation: 153338

Code fails if first char of needle matches a char in the haystack, but the rest does not.
Try fast_strstr("zhis is a test", "test")

Instead of the last return NULL;, code needs to try the rest of the haystack after the first matching letter. A recursive solution follows, but certainly a loop within the function could be had.

return fast_strstr(haystack+1, needle);  // --> YES!
// return NULL;

Code may be fast on some input, but appears to be O(n*n)

Upvotes: 2

Related Questions