bdhar
bdhar

Reputation: 22975

Building count dictionary from statistics file

I have a statistics file like this:

dict-count.txt

apple   15
orange  12
mango   10
apple   1
banana  14
mango   4

I need to count the number of each element and create a dictionary like this: {'orange': 12, 'mango': 14, 'apple': 16, 'banana': 14}. I do the following to achieve this:

from __future__ import with_statement

with open('dict-count.txt') as f:
    lines = f.readlines()

output = {}

for line in lines:
    key, val = line.split('\t')
    output[key] = output.get(key, 0) + int(val)

print output

I am particularly concerned about this part:

key, val = line.split('\t')
output[key] = output.get(key, 0) + int(val)

Is there a better way to do this? Or this is the only way?

Thanks.

Upvotes: 1

Views: 378

Answers (2)

steveha
steveha

Reputation: 76695

For a small file, you can use .readlines(), but that will slurp the entire contents of the file into memory in one go. You can write this using the file object f as an iterator; when you iterate it, you get one line of input at a time.

So, the easiest way to write this is to use a defaultdict as @Amber already showed, but my version doesn't build a list of input lines; it just builds the dictionary as it goes.

I used terse variable names, like d for the dict instead of output.

from __future__ import with_statement
from collections import defaultdict
from operator import itemgetter

d = defaultdict(int)

with open('dict-count.txt') as f:
    for line in f:
        k, v = line.split()
        d[k] += int(v)

lst = d.items()

# sort twice: once for alphabetical order, then for frequency (descending).
# Because the Python sort is "stable", we will end up with descending
# frequency, but alphabetical order for any frequency values that are equal.
lst.sort(key=itemgetter(0))
lst.sort(key=itemgetter(1), reverse=True)

for key, value in lst:
    print("%10s| %d" % (key, value))

Upvotes: 4

Amber
Amber

Reputation: 526483

Use a defaultdict:

from __future__ import with_statement
from collections import defaultdict

output = defaultdict(int)

with open('dict-count.txt') as f:
    for line in f:
        key, val = line.split('\t')
        output[key] += int(val)

print output

Upvotes: 3

Related Questions