Reputation: 363
I am trying to read the contents of a file and check if matches list of patterns using regular expression.
File content:
google.com
https://google.com
yahoo.com
www.yahoo.com
yahoo
My code:
import re
file = 'data_files/test_files/content.txt'
regex_1 = re.compile("google")
regex_2 = re.compile("yahoo")
data = open(file, 'r')
print ("Checking Regex 1")
if regex_1.match(data.read()):
count_c = len(regex_1.findall(data.read()))
print ("Matched Regex 1 - " + str(count_c))
print("Checking Regex 2")
if regex_2.match(data.read()):
count_d = len(regex_2.findall(data.read()))
print("Matched Regex 2 - " + str(count_d))
else:
print ("No match found")
Output:
Checking Regex 1
Checking Regex 2
No match found
Couldn't figure out what is wrong here.
Upvotes: 0
Views: 541
Reputation: 780788
Every time you call data.read()
, it starts reading from the place in the file where the last call finished. Since the first call reads the entire file (because you didn't specify a limit), all the remaining calls start reading from the end of the file, so they don't read anything.
You should read the file into a variable, and then use that instead of calling data.read()
repeatedly.
You also need to use re.search()
, not re.match()
. See What is the difference between re.search and re.match?
import re
file = 'data_files/test_files/content.txt'
regex_1 = re.compile("google")
regex_2 = re.compile("yahoo")
with open(file, 'r') as data:
print ("Checking Regex 1")
if regex_1.search(contents):
count_c = len(regex_1.findall(contents))
print ("Matched Regex 1 - " + str(count_c))
print("Checking Regex 2")
if regex_2.search(contents):
count_d = len(regex_2.findall(contents))
print("Matched Regex 2 - " + str(count_d))
else:
print ("No match found")
Upvotes: 1