User
User

Reputation: 99

Wildcard matching in Python

I have a class called Pattern, and within it two methods, equates and setwildcard. Equates returns the index in which a substring first appears in a string, and setwildcard sets a wild card character in a substring

So

p = Pattern('xyz')
t = 'xxxxxyz'
p.equates(t)

Returns 4

Also

p = Pattern('x*z', '*')
t = 'xxxxxgzx'
p.equates(t)

Returns 4, because * is the wildcard and can match any letter within t, as long as x and z match. What's the best way to implement this?

Upvotes: 3

Views: 38816

Answers (2)

Shakeel Soogun
Shakeel Soogun

Reputation: 334

Regex, like the accepted answer suggests, is one way of handling the problem. Although, if you need a simpler pattern (such as Unix shell-style wildcards), then the fnmatch built in library can help:

Expressions:

  • * - matches everything
  • ? - matches any single character
  • [seq] - matches any character in seq
  • [!seq] - matches any character not in seq

So for example, trying to find anything that would match with localhost:

import fnmatch

my_pattern = "http://localhost*"
name_to_check = "http://localhost:8080"

fnmatch.fnmatch(name_to_check, my_pattern) # True

The nice part of this is that / is not considered a special character, so for filename/URL matching this works out quite well without having to pre-escape all slashes!

Upvotes: 10

UnlimitedHighground
UnlimitedHighground

Reputation: 760

It looks like you're essentially implementing a subset of regular expressions. Luckily, Python has a library for that built-in! If you're not familiar with how regular expressions (or, as their friends call them, regexes) work, I highly recommend you read through the documentation for them.

In any event, the function re.search is, I think, exactly what you're looking for. It takes, as its first argument, a pattern to match, and, as its second argument, the string to match it in. If the pattern is matched, search returns an SRE_Match object, which, conveniently, has a #start() method that returns the index at which the match starts.

To use the data from your example:

 import re
 start_index = re.search(r'x.z', 'xxxxxgzg').start()

Note that, in regexes, . - not * -- is the wildcard, so you'll have to replace them in the pattern you're using.

Upvotes: 1

Related Questions