Reading Regular Expressions from a text file

Question

I'm currently trying to write a function that takes two inputs:

1 - The URL for a web page 2 - The name of a text file containing some regular expressions

My function should read the text file line by line (each line being a different regex) and then it should execute the given regex on the web page source code. However, I've ran in to trouble doing this:

example Suppose I want the address contained on a Yelp with URL = http://www.yelp.com/biz/liberty-grill-cork where the regex is \\s*([^<]*)\b\s*<. In Python, I then run:

address = re.search('\\s*([^<]*)\b\s*<', web_page_source_code)

The above will work, however, if I just write the regex in a text file as is, and then read the regex from the text file, then it won't work. So reading the regex from a text file is what is causing the problem, how can I rectify this?

EDIT: This is how I'm reading the regexes from the text file:

with open("test_file.txt","r") as file:
    for regex in file:
        address = re.search(regex, web_page_source_code)

Just to add, the reason I want to read regexes from a text file is so that my function code can stay the same and I can alter my list of regexes easily. If anyone can suggest any other alternatives that would be great.

MightyPork · Accepted Answer

Your string has some backlashes and other things escaped to avoid special meaning in Python string, not only the regex itself.

You can easily verify what happens when you print the string you load from the file. If your backslashes doubled, you did it wrong.

The text you want in the file is:

File

\\s*([^<]*)\b\s*<

Here's how you can check it

In [1]: a = open('testfile.txt')

In [2]: line = a.readline()

-- this is the line as you'd see it in python code when properly escaped

In [3]: line
Out[3]: '\\s*([^<]*)\b\s*<
'

-- this is what it actually means (what re will use)

In [4]: print(line)
\\s*([^<]*)\b\s*<

Reading Regular Expressions from a text file

Answers (2)

Related Questions