rcoup
rcoup

Reputation: 5778

Determine whether Python object is regex or string

Thought exercise: What is the "best" way to write a Python function that takes a regex pattern or a string to match exactly:

import re
strings = [...]

def do_search(matcher):
  """
  Returns strings matching matcher, which can be either a string
  (for exact match) or a compiled regular expression object
  (for more complex matches).
  """
  if not is_a_regex_pattern(matcher):
    matcher = re.compile('%s$' % re.escape(matcher))

  for s in strings:
    if matcher.match(s):
      yield s

So, ideas for the implementation of is_a_regex_pattern()?

Upvotes: 5

Views: 1540

Answers (5)

On Python 3.7, re._pattern_type was renamed to re.Pattern

https://stackoverflow.com/a/27366172/895245 therefore broke at that point, as re._pattern_type is not defined.

While re.Pattern looks nicer and will therefore hopefully be more stable, it is not mentioned at all in the docs: https://docs.python.org/3/library/re.html#regular-expression-objects so maybe it is not a good idea to rely on it.

https://stackoverflow.com/a/46779329/895245 does make some sense. But what is someday the str class adds a .match method and it does something completely different? :-) Ah, the joys of typeless languages.

So I think I'm going with:

import re

_takes_s_or_re_type = type(re.compile(''))
def takes_s_or_re(s_or_re):
    if isinstance(s_or_re, _takes_s_or_re_type):
        return 0
    else:
        return 1

assert takes_s_or_re(re.compile('a.c')) == 0
assert takes_s_or_re('a.c') == 1

as this can only break when a public API breaks.

Tested on Python 3.8.0.

Upvotes: 0

Bob Stein
Bob Stein

Reputation: 17244

Or, make it quack:

try:
    does_match = matcher.match(s)
except AttributeError:
    does_match = re.match(matcher.s)

if does_match:
    yield s

In other words, treat matcher as if it already were a compiled regular expression. And if that breaks, then treat it like a string that needs to be compiled.

This is called Duck Typing. Not everyone agrees that exceptions should be used like this for routine contingencies. This is the ask-permission versus ask-forgiveness debate. Python is more amenable to forgiveness than most languages.

Upvotes: 1

user2555451
user2555451

Reputation:

You can access the _sre.SRE_Pattern type via re._pattern_type:

if not isinstance(matcher, re._pattern_type):
    matcher = re.compile('%s$' % re.escape(matcher))

Below is a demonstration:

>>> import re
>>> re._pattern_type
<class '_sre.SRE_Pattern'>
>>> isinstance(re.compile('abc'), re._pattern_type)
True
>>>

Upvotes: 8

Daniel
Daniel

Reputation: 42768

You could test, if matcher has an method match:

import re

def do_search(matcher, strings):
    """
    Returns strings matching matcher, which can be either a string
    (for exact match) or a compiled regular expression object
    (for more complex matches).
    """
    if hasattr(matcher, 'match'):
        test = matcher.match
    else:
        test = lambda s: matcher==s

    for s in strings:
        if test(s):
            yield s

You should not use global variables, but use a second parameter.

Upvotes: 0

rcoup
rcoup

Reputation: 5778

  1. Not a string:

    def is_a_regex_pattern(s):
      return not isinstance(s, basestring)
    
  2. Is a _sre.SRE_Pattern (though that's not importable, so use a gross string match):

    def is_a_regex_pattern(s):
      return s.__class__.__name__ == 'SRE_Pattern'
    
  3. You can re-compile a SRE_Pattern and it seems to evaluate the same.

    def is_a_regex_pattern(s):
      return s == re.compile(s)
    

Upvotes: 0

Related Questions