Tanay Suthar
Tanay Suthar

Reputation: 453

Count the Number of string patterns using regular expressions from a File

I created the file 'file1.txt' having "hello" as a searched pattern. Below program count the number of lines perfectly, although the issue here is with the word or pattern count "hello". For the same line, if there are 3 'hello', still it only count only one. In my file there are total 13 'hello' pattern. Among them 2 lines have 3 'hello' pattern. So, Ultimately the answer I am getting is 9 instead of 13. So, for each of the line it count as only 1. How to solve this issue ?

import re

def reg_exp():
    pattern = 'hello'
    infile = open('file1.txt', 'r')
    match_count = 0
    lines = 0

    for line in infile:
        match = re.search(pattern, line)
        if match:
            match_count += 1
            lines += 1
    return (lines, match_count)

if __name__ == "__main__":
    lines, match_count = reg_exp()
    print 'LINES::', lines
    print 'MATCHES::', match_count

Upvotes: 2

Views: 1262

Answers (2)

Mariano
Mariano

Reputation: 6511

That's how regex works. re.search() returns as soon as it finds the first match. You could iterate with re.finditer(), or use re.findall() to return all matches for each line.

for line in infile:
    match = re.findall(pattern, line)   
    if match:
        match_count += len(match)
        lines += 1

ideone Demo


re.search(pattern, string, flags=0)

Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.


re.findall(pattern, string, flags=0)

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

Upvotes: 2

Brian
Brian

Reputation: 967

def reg_exp():
    pattern = '(hello)'
    infile = open('file1.txt', 'r')
    match_count = 0
    lines = 0

    for line in infile:
        match = re.search(pattern, line)
        if match:
            match_count += len(match.groups())
            lines += 1
    return (lines, match_count)

Upvotes: -1

Related Questions