Reputation: 12413
I have a python script the given a pattern goes over a file and for each line that matches the pattern it keeps counts how many times that line shows up in the file.
The script is the following:
#!/usr/bin/env python
import time
fnamein = 'Log.txt'
def filter_and_count_matches(fnamein, fnameout, match):
fin = open(fnamein, 'r')
curr_matches = {}
order_in_file = [] # need this because dict has no particular order
for line in (l for l in fin if l.find(match) >= 0):
line = line.strip()
if line in curr_matches:
curr_matches[line] += 1
else:
curr_matches[line] = 1
order_in_file.append(line)
#
fout = open(fnameout, 'w')
#for line in order_in_file:
for line, _dummy in sorted(curr_matches.iteritems(),
key=lambda (k, v): (v, k), reverse=True):
fout.write(line + '\n')
fout.write(' = {}\n'.format(curr_matches[line]))
fout.close()
def main():
for idx, match in enumerate(open('staffs.txt', 'r').readlines()):
curr_time = time.time()
match = match.strip()
fnameout = 'm{}.txt'.format(idx+1)
filter_and_count_matches(fnamein, fnameout, match)
print 'Processed {}. Time = {}'.format(match, time.time() - curr_time)
main()
So right now I am going over the file each time I want to check for a different pattern. It is possible to do this go going over the file just once (the file is quite big, so it takes a while to process). It would be nice to be able to do this in a elegant "easy" way. Thanks!
Thanks
Upvotes: 1
Views: 1981
Reputation: 336168
Looks like a Counter
would do what you need:
from collections import Counter
lines = Counter([line for line in myfile if match_string in line])
For example, if myfile
contains
123abc
abc456
789
123abc
abc456
and match_string
is "abc"
, then the above code gives you
>>> lines
Counter({'123abc': 2, 'abc456': 2})
For multiple patterns, how about this:
patterns = ["abc", "123"]
# initialize one Counter for each pattern
results = {pattern:Counter() for pattern in patterns}
for line in myfile:
for pattern in patterns:
if pattern in line:
results[pattern][line] += 1
Upvotes: 2