Reputation: 22975
I have a statistics file like this:
dict-count.txt
apple 15
orange 12
mango 10
apple 1
banana 14
mango 4
I need to count the number of each element and create a dictionary like this: {'orange': 12, 'mango': 14, 'apple': 16, 'banana': 14}
. I do the following to achieve this:
from __future__ import with_statement
with open('dict-count.txt') as f:
lines = f.readlines()
output = {}
for line in lines:
key, val = line.split('\t')
output[key] = output.get(key, 0) + int(val)
print output
I am particularly concerned about this part:
key, val = line.split('\t')
output[key] = output.get(key, 0) + int(val)
Is there a better way to do this? Or this is the only way?
Thanks.
Upvotes: 1
Views: 378
Reputation: 76695
For a small file, you can use .readlines()
, but that will slurp the entire contents of the file into memory in one go. You can write this using the file object f
as an iterator; when you iterate it, you get one line of input at a time.
So, the easiest way to write this is to use a defaultdict
as @Amber already showed, but my version doesn't build a list of input lines; it just builds the dictionary as it goes.
I used terse variable names, like d
for the dict instead of output
.
from __future__ import with_statement
from collections import defaultdict
from operator import itemgetter
d = defaultdict(int)
with open('dict-count.txt') as f:
for line in f:
k, v = line.split()
d[k] += int(v)
lst = d.items()
# sort twice: once for alphabetical order, then for frequency (descending).
# Because the Python sort is "stable", we will end up with descending
# frequency, but alphabetical order for any frequency values that are equal.
lst.sort(key=itemgetter(0))
lst.sort(key=itemgetter(1), reverse=True)
for key, value in lst:
print("%10s| %d" % (key, value))
Upvotes: 4
Reputation: 526483
Use a defaultdict
:
from __future__ import with_statement
from collections import defaultdict
output = defaultdict(int)
with open('dict-count.txt') as f:
for line in f:
key, val = line.split('\t')
output[key] += int(val)
print output
Upvotes: 3