More detailed explanation of function that counts the amount of times two strings match, including wildcard char *

Question

I'm a beginner working on an exercise where the aim is to count the amount of times two strings match. When we have two or more stars, multiple string combinations can be suitable. Here are some examples:

"abcbd" & "*b*" match twice : ("a","cbd") and ("abc", "d")

"abc" & "a**" match 3 times : (nothing,"bc"), ("b", "c") and ("bc", nothing)

int nummatch(char *s1, char *s2)
{
    int i;
    int matches;

    if (*s1 == '\0' && *s2 == '\0')
        return (1);
    else if (*s2 == '*')
    {
        i = 0;
        matches = 0;
        while (s1[i] != '\0')
            matches += nummatch(s1 + i++, s2 + 1);
        matches += nummatch(s1 + i, s2 + 1);
        return (matches);
    }
    else if (*s1 == *s2)
        return (nummatch(s1 + 1, s2 + 1));
    else
        return (0);
}

Someone submitted this code and claims it works. I'm struggling to understand how it works. How does it account for overlap? What would happen with a combination explosion like xxxxxxxxxxxxxxxxxxxxa and *x*x*x*x*a were tested?

Eric Postpischil · Accepted Answer

The routine partitions the problem into four possibilities based on the first character in one or both of the strings:

*s1 == '\0' && *s2 == '\0': Both strings are zero length, so the first matches the second, so return 1.
*s2 == '*': The second string has a wildcard character here. This is discussed below.
*s1 == *s2: The second string does not have a wildcard character here (because that was checked above), so that character matches only if the character in the first string is the same as it. If is the same, we use nummatch(s1 + 1, s2 + 1) to count the matches in the remaining parts of the strings and return that.
Otherwise, the second string does not have a wildcard character here, and the character in the first string is different, so there is no match. Return 0.

In the second case, we use a loop to count all the ways the first string can match the second. The * is allowed to match any number of characters in the first string, so we iterate i from 0 up to all the character in the first string and count the possibilities.

If we counted the remaining length of the first string, we might use code like this:

matches = 0;
for (int i = 0; i <= length of first string; ++i)
    matches += nummatch(s1 + i, s2 + 1);

What this says is:

For each i from 0 to the length of the first string, let the * in the second string match the first i characters of the first string.
Then count how many matches there are in the rest of the first string (s1 + i, which is the first string from position i onward) to the rest of the second string (s2 + 1, which is the second string from after this * onward).

However, the code did not count the length of the first string first. Instead, it builds it into the loop. This code:

while (s1[i] != '\0')
    matches += nummatch(s1 + i++, s2 + 1);

keeps incrementing i and counting matches just like the for loop above until the point where s1[i] is the null character that marks the end of the string. At that point, it has counted all the matches where we let * mark any number of characters in the first string less than its full length. So it is missing one case. The next line does that; it counts the matches in the case where the * matches the full first string and the rest of the second string matches the empty string:

matches += nummatch(s1 + i, s2 + 1);

(The rest of the second string can match the empty string either because it is the empty string or because it contains only * wildcard characters.)

What would happen with a combination explosion like xxxxxxxxxxxxxxxxxxxxa and *x*x*x*x*a were tested?

It will individually count all the different ways the strings can match, so execution time will increase proportionately.

More detailed explanation of function that counts the amount of times two strings match, including wildcard char *

Answers (2)

Related Questions