More detailed explanation of function that counts the amount of times two strings match, including wildcard char *

I'm a beginner working on an exercise where the aim is to count the amount of times two strings match. When we have two or more stars, multiple string combinations can be suitable. Here are some examples:

"abcbd" & "*b*" match twice : ("a","cbd") and ("abc", "d")

"abc" & "a**" match 3 times : (nothing,"bc"), ("b", "c") and ("bc", nothing)

int nummatch(char *s1, char *s2)
{
    int i;
    int matches;

    if (*s1 == '\0' && *s2 == '\0')
        return (1);
    else if (*s2 == '*')
    {
        i = 0;
        matches = 0;
        while (s1[i] != '\0')
            matches += nummatch(s1 + i++, s2 + 1);
        matches += nummatch(s1 + i, s2 + 1);
        return (matches);
    }
    else if (*s1 == *s2)
        return (nummatch(s1 + 1, s2 + 1));
    else
        return (0);
}

Someone submitted this code and claims it works. I'm struggling to understand how it works. How does it account for overlap? What would happen with a combination explosion like xxxxxxxxxxxxxxxxxxxxa and *x*x*x*x*a were tested?

Upvotes: 2

Views: 144

Answers (2)

CiaPan
CiaPan

Reputation: 9570

The first if handles the case of both input strings empty. They are considered equal, that is matching, hence the return value is 1 for one match found.

The second if handles the case of s2 starting with an asterisk.
This may match anyting from nothing (an empty 'substring' prefixing s1) to the whole s1. So the for loop iterates through the left argument simulating all possible partial matches of the asterisk. For each attempt it recursively calls the function to count matches of the remaining parts of s1 to the s2 with the starting '*' stripped.
All results add up to make an answer returned upwards on return from recursion.

The third if handles the case when both strings begin with the same character. Then we'll have a match iff the remaining parts match, so the result is determined by matching both strings with their initial character skipped.

Finally, if all conditions are false, we have two strings, at least one of which being non-empty, with different first character and the first char of s2 not being an asterisk. This implies they do not match, hence return 0.

Upvotes: 1

Eric Postpischil
Eric Postpischil

Reputation: 222908

The routine partitions the problem into four possibilities based on the first character in one or both of the strings:

  • *s1 == '\0' && *s2 == '\0': Both strings are zero length, so the first matches the second, so return 1.
  • *s2 == '*': The second string has a wildcard character here. This is discussed below.
  • *s1 == *s2: The second string does not have a wildcard character here (because that was checked above), so that character matches only if the character in the first string is the same as it. If is the same, we use nummatch(s1 + 1, s2 + 1) to count the matches in the remaining parts of the strings and return that.
  • Otherwise, the second string does not have a wildcard character here, and the character in the first string is different, so there is no match. Return 0.

In the second case, we use a loop to count all the ways the first string can match the second. The * is allowed to match any number of characters in the first string, so we iterate i from 0 up to all the character in the first string and count the possibilities.

If we counted the remaining length of the first string, we might use code like this:

matches = 0;
for (int i = 0; i <= length of first string; ++i)
    matches += nummatch(s1 + i, s2 + 1);

What this says is:

  • For each i from 0 to the length of the first string, let the * in the second string match the first i characters of the first string.
  • Then count how many matches there are in the rest of the first string (s1 + i, which is the first string from position i onward) to the rest of the second string (s2 + 1, which is the second string from after this * onward).

However, the code did not count the length of the first string first. Instead, it builds it into the loop. This code:

while (s1[i] != '\0')
    matches += nummatch(s1 + i++, s2 + 1);

keeps incrementing i and counting matches just like the for loop above until the point where s1[i] is the null character that marks the end of the string. At that point, it has counted all the matches where we let * mark any number of characters in the first string less than its full length. So it is missing one case. The next line does that; it counts the matches in the case where the * matches the full first string and the rest of the second string matches the empty string:

matches += nummatch(s1 + i, s2 + 1);

(The rest of the second string can match the empty string either because it is the empty string or because it contains only * wildcard characters.)

What would happen with a combination explosion like xxxxxxxxxxxxxxxxxxxxa and *x*x*x*x*a were tested?

It will individually count all the different ways the strings can match, so execution time will increase proportionately.

Upvotes: 1

Related Questions