user3696118
user3696118

Reputation: 353

How to count a set of strings in a file in Python?

i've looked up a few threads on here but not one actually matches my situation.

I have basically a text file that looks something like this:

orange 0 0 0
orange 1 0 0
orange 2 0 0
orange 3 0 0
orange 4 0 0
orange 5 0 0
apple 0 0 0
apple 1 0 0
apple 2 0 0
apple 3 0 0
apple 4 0 0
apple 5 0 0
grapes 0 0 0
grapes 1 0 0
grapes 2 0 0
grapes 3 0 0
grapes 4 0 0
grapes 5 0 0

what I need to do, is to be able to take the first word as a string, and search how many lines contain that first word, then move on to the next word, and search for how many lines that contains that word. So the result should look something like this:

firstTermCount: 6
secondTermCount: 6
thirdTermCount: 6

I need have this count number, so that in the next step I can have a command that is supposed to run in the range of exactly how many lines of that string occurs to utilize the numbers next to each word.

the issue here is that, I have no idea what those terms are actually going to be called, so I can't do this whole "Count" or "count_dict" technique i keep seeing, since to me it seems like you need to have a set name for the function to actually look for. Plus I have no idea how many lines there will be in a file each time, I would have to do it each time I read a file. I know the example i wrote had five lines each, but honestly the type of file I want to read will have a random number of lines so I can't just say like "look for it 5 times"

Could anyone provide a solution to this issue, or perhaps a link to a thread that answers this question that I may have missed...?

Thank you

Note: I am using Python v2.6.4, if that helps

EDIT So a user suggested that I use the Counter feature, or use this dictionary method, but either way it doesn't quite give me the result I need. So for example, using this Counter method (i used a work around listed here:

new list:
orange 0 0 0
orange 1 0 0
orange 2 0 0
orange 3 0 0
orange 4 0 0
apple 1 0 0
apple 2 0 0
apple 4 0 0
apple 5 0 0
grapes 1 0 0
grapes 2 0 0
grapes 4 0 0
peaches 0 0 0
peaches 1 0 0
peaches 2 0 0
peaches 3 0 0
peaches 5 0 0
peaches 6 0 0

and this is what the counter method gives me:

{'orange': 5, 'peaches': 6, 'apple': 4, 'grapes': 3}

when what I WANT is this:

{'orange': 5, 'apple': 4, 'grapes': 3,'peaches': 6 }

How can i get these counts in this order?

Upvotes: 0

Views: 302

Answers (2)

xecgr
xecgr

Reputation: 5193

Counter is what you need https://docs.python.org/2/library/collections.html#collections.Counter

>>> from collections import Counter
>>> lines = []
>>> with open('foo.data', 'r') as foo:
...     lines = foo.readlines()
>>> c = Counter([l.split(" ")[0] for l in lines])
>>> c
Counter({'orange': 6, 'apple': 6, 'grapes': 6})

Counter is new in python 2.7, so here's a "manual" solution, with order guaranteed

>>> manual_dict = {}
>>> with open('foo.data', 'r') as foo:
...     lines = foo.readlines()
... 
>>> for idx,l in enumerate(lines):
...     word = l.split(" ")[0]
...     if not word in manual_dict:
...         manual_dict[word] = {'count' : 0, 'pos' : 0}
...     manual_dict[word]['count'] +=1
...     if not manual_dict[word]['pos']:
...         manual_dict[word]['pos'] = idx
... 
>>> for w,w_config in sorted(manual_dict.items(), key=lambda x: x[1]['pos']):
...   print w, w_config['count']
... 
orange 5
apple 4
grapes 3
peaches 6

Upvotes: 2

Burhan Khalid
Burhan Khalid

Reputation: 174624

I think the problem is you want to have the words listed in the order they were found in the file, plus their counts. Dictionaries (and Counter, because its just a fancy dictionary) are unordered because their purpose is quick lookups.

The collections module has OrderedDict, along with a link to this alternative implementation if you are not using 2.7.

You could chose to implement that, or you can do something simpler by collecting the words in a list (to preserve order), and their counts:

from __future__ import with_statement

counts = dict()
words = list()

with open('somefile.txt') as f:
   for line in f:
       if len(line.strip()):
           bits = line.split(' ')
           if bits[0] not in words:
               words.append(bits[0])
               counts[bits[0]] = 1
           else:
               counts[bits[0]] += 1

for word in words:
    print 'Word: %s\tCount:%s' % (word, counts[word])

Upvotes: 0

Related Questions