Reputation: 5778
Thought exercise: What is the "best" way to write a Python function that takes a regex pattern or a string to match exactly:
import re
strings = [...]
def do_search(matcher):
"""
Returns strings matching matcher, which can be either a string
(for exact match) or a compiled regular expression object
(for more complex matches).
"""
if not is_a_regex_pattern(matcher):
matcher = re.compile('%s$' % re.escape(matcher))
for s in strings:
if matcher.match(s):
yield s
So, ideas for the implementation of is_a_regex_pattern()
?
Upvotes: 5
Views: 1540
Reputation: 383688
On Python 3.7, re._pattern_type
was renamed to re.Pattern
https://stackoverflow.com/a/27366172/895245 therefore broke at that point, as re._pattern_type
is not defined.
While re.Pattern
looks nicer and will therefore hopefully be more stable, it is not mentioned at all in the docs: https://docs.python.org/3/library/re.html#regular-expression-objects so maybe it is not a good idea to rely on it.
https://stackoverflow.com/a/46779329/895245 does make some sense. But what is someday the str
class adds a .match
method and it does something completely different? :-) Ah, the joys of typeless languages.
So I think I'm going with:
import re
_takes_s_or_re_type = type(re.compile(''))
def takes_s_or_re(s_or_re):
if isinstance(s_or_re, _takes_s_or_re_type):
return 0
else:
return 1
assert takes_s_or_re(re.compile('a.c')) == 0
assert takes_s_or_re('a.c') == 1
as this can only break when a public API breaks.
Tested on Python 3.8.0.
Upvotes: 0
Reputation: 17244
Or, make it quack:
try:
does_match = matcher.match(s)
except AttributeError:
does_match = re.match(matcher.s)
if does_match:
yield s
In other words, treat matcher
as if it already were a compiled regular expression. And if that breaks, then treat it like a string that needs to be compiled.
This is called Duck Typing. Not everyone agrees that exceptions should be used like this for routine contingencies. This is the ask-permission versus ask-forgiveness debate. Python is more amenable to forgiveness than most languages.
Upvotes: 1
Reputation:
You can access the _sre.SRE_Pattern
type via re._pattern_type
:
if not isinstance(matcher, re._pattern_type):
matcher = re.compile('%s$' % re.escape(matcher))
Below is a demonstration:
>>> import re
>>> re._pattern_type
<class '_sre.SRE_Pattern'>
>>> isinstance(re.compile('abc'), re._pattern_type)
True
>>>
Upvotes: 8
Reputation: 42768
You could test, if matcher
has an method match
:
import re
def do_search(matcher, strings):
"""
Returns strings matching matcher, which can be either a string
(for exact match) or a compiled regular expression object
(for more complex matches).
"""
if hasattr(matcher, 'match'):
test = matcher.match
else:
test = lambda s: matcher==s
for s in strings:
if test(s):
yield s
You should not use global variables, but use a second parameter.
Upvotes: 0
Reputation: 5778
Not a string:
def is_a_regex_pattern(s):
return not isinstance(s, basestring)
Is a _sre.SRE_Pattern
(though that's not importable, so use a gross string match):
def is_a_regex_pattern(s):
return s.__class__.__name__ == 'SRE_Pattern'
You can re-compile a SRE_Pattern and it seems to evaluate the same.
def is_a_regex_pattern(s):
return s == re.compile(s)
Upvotes: 0