Trouble Using Variables Inside Regular Expressions

What I have looked at already: how to use a variable inside a regular expression

Here is the code I have:

import re

#take user input as an argument
print('Enter 1st Argument: value to strip.')

user_input = input()

#take value to strip off as another argument
print('Enter 2nd Argument: The value to strip off the 1st value.')

strip_value = input()

#Recreate Strip Function
def regex_strip(value,what_to_strip):

     thing2 = 'L'
     what_to_strip = re.compile(r + re.escape(thing2))
     print(what_to_strip)
    #fv = what_to_strip.search('tigers named L')
    #print(fv.group())

regex_strip(user_input, strip_value)

I am expecting the user to submit two values. The first value is the value that will be subject to the stripping. The 2nd value is what is being stripped.

In my function, I am hard-coding values in order to test my regular expression.

Error message I am getting:

name 'r' is not defined

what am I doing wrong?

Edit #1: This is what I have tried:

thing2 = '\d'
what_to_strip = re.compile(re.escape(thing2))
print(what_to_strip)
fv = what_to_strip.search('123')
print(fv.group())

Result:

'NoneType' object has no attribute 'group'

My thoughts: Something is wrong with thing2 = '\d' I want just '\d' but I am getting '\\\\d' hmm.

Upvotes: 1

Views: 54

Answers (2)

abarnert
abarnert

Reputation: 365667

The first problem is that you're confusing raw string literals with strings. A string literal is the way you enter a string in your Python source code, like "abc". You can use an r prefix to make this a raw string literal, like r"a\b\c". That doesn't change what kind of string it is, it just prevents the usual Python source code rules from being applied, so you get actual backslashes and letters instead of special characters like a backspace. So, you can't turn user input into a raw string, but you don't have to—the string is already exactly the letters the user typed.

(This can be a bit confusing, because when you print out a regular expression, you see something like re.compile(r'\.', re.UNICODE). That r isn't really part of the object; it's showing you how you could create exactly the same regular expression object in your source code.)


The re.escape function is sort of similar, but it's not the same thing. What it does is take a regex pattern and turn it into another pattern with all the regex special characters escaped. So, for example, re.escape('.') gives you \., meaning it will only match an actual . character, rather than matching anything. Since user input can easily contain characters like ., and the user probably isn't asking you to strip every character, you were right to use re.escape here.

So:

re.compile(re.escape(thing2))

When you tested this code with the input \d and tried to search the string 123, it didn't find anything. But that's exactly what you want. If the user types in \d, they're not asking to strip off any digit, they're asking to strip off \ and d.

Of course for some programs, you really do want to take regular expressions from the user. (For example, you might want to write something similar to grep.) In that case, you wouldn't call re.escape.


One last thing: When you call '1234'.strip('14'), that doesn't strip off the string '14' from both sides, it strips off any characters that are in the string '14'—in order words, you'll get back 23. To make this work with a regular expression, you want to turn that '14' into '1|4'. In other words, you want to escape each character, and then join those characters up with '|', to get the pattern.

Upvotes: 1

crestniraz
crestniraz

Reputation: 76

You can skip the escape function:

what_to_strip = re.compile(thing2)

:)

Upvotes: 1

Related Questions