Reputation: 453
I created the file 'file1.txt' having "hello" as a searched pattern. Below program count the number of lines perfectly, although the issue here is with the word or pattern count "hello". For the same line, if there are 3 'hello', still it only count only one. In my file there are total 13 'hello' pattern. Among them 2 lines have 3 'hello' pattern. So, Ultimately the answer I am getting is 9 instead of 13. So, for each of the line it count as only 1. How to solve this issue ?
import re
def reg_exp():
pattern = 'hello'
infile = open('file1.txt', 'r')
match_count = 0
lines = 0
for line in infile:
match = re.search(pattern, line)
if match:
match_count += 1
lines += 1
return (lines, match_count)
if __name__ == "__main__":
lines, match_count = reg_exp()
print 'LINES::', lines
print 'MATCHES::', match_count
Upvotes: 2
Views: 1262
Reputation: 6511
That's how regex works. re.search()
returns as soon as it finds the first match. You could iterate with re.finditer()
, or use re.findall()
to return all matches for each line.
for line in infile:
match = re.findall(pattern, line)
if match:
match_count += len(match)
lines += 1
re.search(pattern, string, flags=0)
Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.
re.findall(pattern, string, flags=0)
Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.
Upvotes: 2
Reputation: 967
def reg_exp():
pattern = '(hello)'
infile = open('file1.txt', 'r')
match_count = 0
lines = 0
for line in infile:
match = re.search(pattern, line)
if match:
match_count += len(match.groups())
lines += 1
return (lines, match_count)
Upvotes: -1