Python - Finding string frequencies of list of strings in text file

Question

I am trying to find all occurrences of strings in a text file, where each string is located on a new line in the file.

For example, an example file may look like this:

jump start
jump go
feet start
jump go

The target tally would be 1 for all strings, except for "jump go" would have 2

So far, I have been successful at finding individual word counts using this code:

import re
import collections
with open('file.txt') as f:
    text = f.read()
words = re.findall(r'\w+',text)
counts = collections.Counter(words)
print(counts)

However, this only gives output like: jump = 3, start = 2, go = 2, feet = 1

Not sure if this matters, but the number of lines in the file will be around 5 million, with around 12,000 independent strings.

Thank you for any help!

Fran Borcic · Accepted Answer

Instead of using the regex, read the file as words=f.readlines() . You'll end up with a list of strings corresponding to each line. Then, build the counter from that list.

Python - Finding string frequencies of list of strings in text file

Answers (2)

Related Questions