Learner
Learner

Reputation: 57

Search for a pattern in a string in python

Question: I am very new to python so please bear with me. This is a homework assignment that I need some help with.

So, for the matchPat function, I need to write a function that will take two arguments, str1 and str2, and return a Boolean indicating whether str1 is in str2. But I have to use an asterisk as a wild card in str1. The * can only be used in str1 and it will represent one or more characters that I need to ignore. Examples of matchPat are as follow:

matchPat ( 'a*t*r', 'anteaters' ) : True

matchPat ( 'a*t*r', 'albatross' ) : True

matchPat ( 'a*t*r', 'artist' ) : False

My current matchPat function can tell whether the characters of str1 are in str2 but I don't really know how I could tell python (by using the * as a wild card) to look for 'a' (the first letter) and after it finds a, skip the next 0 or more characters until it finds the next letter(which would be 't' in the example) and so on.

def matchPat(str1,str2):
    ## str(*)==str(=>1)
    if str1=='':
        return True
    elif str2=='':
        return False
    elif str1[0]==str2[0]:
        return matchPat(str1[2],str2[len(str1)-1])
    else: return True

Upvotes: 4

Views: 7080

Answers (5)

Will
Will

Reputation: 75625

Python strings have the in operator; you can check if str1 is a substring of str2 using str1 in str2.

You can split a string into a list of substrings based on a token. "a*b*c".split("*") is ["a","b","c"].

You can find the offset of next occurrence of a substring in a string using the string's find method.

So the problem of wildcard matching becomes:

  1. split the pattern into parts which were separated by astrix
  2. for each part of the pattern
  3. can we find this after the previous part's locations?

You are going to have to cope with corner cases like patterns that start with or end with an asterisk or have two asterisk beside each other and so on. Good luck!

Upvotes: 2

Kamehameha
Kamehameha

Reputation: 5473

The basic idea here is to compare each character in str1 and str2, and if char in str1 is "*", find that character in str2 which is the character next to the "*" in str1.

Assuming that you are not going to use any function, (except find(), which can be implemented easily), this is the hard way (the code is straight-forward but messy, and I've commented wherever possible)-

def matchPat(str1, str2):
    index1 = 0
    index2 = 0
    while index1 < len(str1):
        c = str1[index1]
        #Check if the str2 has run it's course.
        if index2 >= len(str2):
            #This needs to be checked,assuming matchPatch("*", "") to be true
            if(len(str2) == 0 and str1 == "*"):
                return True
            return False
        #If c is not "*", then it's normal comparision.
        if c != "*":
            if c != str2[index2]:
                 return False
            index2 += 1
        #If c is "*", then you need to increment str1,
        #search for the next value in str2,
        #and update index2
        else:
            index1 += 1
            if(index1 == len(str1)):
                return True       
            c = str1[index1]
            #Search the character in str2
            i = str2.find(c, index2)
            #If search fails, return False
            if(i == -1):
                return False
            index2 = i + 1
        index1 += 1
    return True

OUTPUT -

print matchPat("abcde", "abcd")
#False
print matchPat("a", "")
#False
print matchPat("", "a")
#True
print matchPat("", "")
#True
print matchPat("abc", "abc")
#True
print matchPat("ab*cd", "abacacd")
#False
print matchPat("ab*cd", "abaascd")
#True
print matchPat ('a*t*r', 'anteater')
#True
print matchPat ('a*t*r', 'albatross')
#True
print matchPat ('a*t*r', 'artist')
#False

Upvotes: 0

user3030010
user3030010

Reputation: 1857

Are you allowed to use regular expressions? If so, the function you're looking for already exists in the re.search function:

import re
bool(re.search('a.t.r', 'anteasters')) # True
bool(re.search('a.t.r', 'artist' ))    # False

And if asterisks are a strict necessity, you can use regular expressions for that, too:

newstr = re.sub('\*', '.', 'a*t*r')    # Replace * with .
bool(re.search(newstr, 'anteasters'))  # Search using the new string

If regular expressions aren't allowed, the simplest way to do that would be to look at substrings of the second string that are the same length as the first string, and compare the two. Something like this:

def matchpat(str1, str2):
    if len(str1) > len(str2): return False  #Can't match if the first string is longer
    for i in range(0, len(str2)-len(str1)+1):
        substring = str2[i:i+len(str1)] # create substring of same length as first string
        for j in range(0, len(str1)):
            matched = False                        # assume False until match is found
            if str1[j] != '*' and str1[j] != substring[j]: # check each character
                break
            matched = True
        if matched == True: break # we don't need to keep searching if we've found a match
    return matched    

Upvotes: -1

holdenweb
holdenweb

Reputation: 37023

There is a find() method of strings that searches for a substring from a particular point, returning either its index (if found) or -1 if not found. The index() method is similar but raises an exception if the target string is not found.

I'd suggest that you first split the pattern string on "*". This will give you a list of chunks to look for. Set the starting position to zero, and for each element in the list of chunks, do a find() or index() from the current position.

If you find the current chunk then work out from its starting position and length where to start searching for the next chunk and update the starting position. If you find all the chunks then the target string matches the pattern. If any chunk is missing then the pattern search should fail.

Since this is homework I am hoping that gives you enough of an idea to move on.

Upvotes: 1

Al Sweigart
Al Sweigart

Reputation: 12939

Without giving you the complete answer, first, split the str1 string into a list of strings on the '*' character. I usually call str1 the "needle" and str2 the "haystack", since you are looking for the needle in the haystack.

needles = needle.split('*')

Next, have a counter (which I will call i) start at 0. You will always be looking at haystack[i:] for the next string in needles.

In pseudocode, it'll look like this:

needles = needle.split('*')
i = 0
loop through all strings in needles:
    if current needle not in haystack[i:], return false
    increment i to just after the occurence of the current needle in haystack (use the find() string method or write your own function to handle this)
return true

Upvotes: 0

Related Questions