Karthik
Karthik

Reputation: 363

Matching pattern from file content and count occurrences

I am trying to read the contents of a file and check if matches list of patterns using regular expression.

File content:

google.com
https://google.com
yahoo.com
www.yahoo.com
yahoo

My code:

import re
file = 'data_files/test_files/content.txt'

regex_1 = re.compile("google")
regex_2 = re.compile("yahoo")

data = open(file, 'r')

print ("Checking Regex 1")
if regex_1.match(data.read()):
    count_c = len(regex_1.findall(data.read()))
    print ("Matched Regex 1 - " + str(count_c))
print("Checking Regex 2")

if regex_2.match(data.read()):
    count_d = len(regex_2.findall(data.read()))
    print("Matched Regex 2 -  " + str(count_d))
else:
    print ("No match found")

Output:

Checking Regex 1
Checking Regex 2
No match found

Couldn't figure out what is wrong here.

Upvotes: 0

Views: 541

Answers (1)

Barmar
Barmar

Reputation: 780788

Every time you call data.read(), it starts reading from the place in the file where the last call finished. Since the first call reads the entire file (because you didn't specify a limit), all the remaining calls start reading from the end of the file, so they don't read anything.

You should read the file into a variable, and then use that instead of calling data.read() repeatedly.

You also need to use re.search(), not re.match(). See What is the difference between re.search and re.match?

import re
file = 'data_files/test_files/content.txt'

regex_1 = re.compile("google")
regex_2 = re.compile("yahoo")

with open(file, 'r') as data:

print ("Checking Regex 1")
if regex_1.search(contents):
    count_c = len(regex_1.findall(contents))
    print ("Matched Regex 1 - " + str(count_c))

print("Checking Regex 2")
if regex_2.search(contents):
    count_d = len(regex_2.findall(contents))
    print("Matched Regex 2 -  " + str(count_d))
else:
    print ("No match found")

Upvotes: 1

Related Questions