ytu
ytu

Reputation: 1850

How to notate raw string regex with a called instance

I know how Python use "r" as the raw string notation in regular expression:

However, I'd like to apply that in a while loop like:

while i < len(organized_texts) and j < len(frag_texts):
    if re.match(frag_texts[j], organized_texts[i]):
        # If frag_texts[j] matches the beginning of organized_texts[i]
        # Do things

The problem is that frag_texts[j] can contain literal "(" and that's where re.match(frag_texts[j], organized_texts[i]) blows up with error: missing ), unterminated subpattern at position 2.

Apparently I can do neither rfrag_texts[j] nor \frag_texts[j]. I've tried re.match("r'{}'".format(frag_texts[j]), organized_texts[i]) but it gives me the same error too. What options do I have now?

Upvotes: 0

Views: 67

Answers (1)

holdenweb
holdenweb

Reputation: 37103

Raw strings aren't a different data type - they are just an alternative way to write certain strings, making it less complex to express literal string values in your program code. Since regular expressions often contain backslashes, raw strings are frequently used as it avoids the need to write \\ for each backslash.

If you want to match arbitrary text fragments then you probably shouldn't be using regular expressions at all. I'd take a look at the startswith string method, since that just does a character-for-character comparison and is therefore much faster. And there's also the equivalent of re.search, should you need it, using the in keyword.

You might be interested in this article by a regular expression devotee. Regular expressions are indeed great, but they shouldn't be the first tool you reach for in string matching problems.

If it became necessary for some reason to use regexen than you 'd be interested in the re.escape method,, which will quote special characters so they are interpreted as standard characters rather than having their standard regex meaning.

Upvotes: 2

Related Questions