Reputation: 13
I am trying to find all occurrences of strings in a text file, where each string is located on a new line in the file.
For example, an example file may look like this:
jump start
jump go
feet start
jump go
The target tally would be 1 for all strings, except for "jump go" would have 2
So far, I have been successful at finding individual word counts using this code:
import re
import collections
with open('file.txt') as f:
text = f.read()
words = re.findall(r'\w+',text)
counts = collections.Counter(words)
print(counts)
However, this only gives output like: jump = 3, start = 2, go = 2, feet = 1
Not sure if this matters, but the number of lines in the file will be around 5 million, with around 12,000 independent strings.
Thank you for any help!
Upvotes: 0
Views: 976
Reputation: 2567
I got this to work:
import collections
lines = [line.strip() for line in open('results.txt')]
counts = collections.Counter(lines)
print counts
Output:
['Sam', 'sam', 'johm go', 'johm go', 'johm for']
Counter({'johm go': 2, 'sam': 1, 'Sam': 1, 'johm for': 1})
Upvotes: 2
Reputation: 716
Instead of using the regex, read the file as words=f.readlines()
. You'll end up with a list of strings corresponding to each line. Then, build the counter from that list.
Upvotes: 0