Reputation: 786
I have a file with the following input data:
IN OUT
data1 2.3 1.3
data2 0.1 2.1
data3 1.5 2.8
dataX ... ...
There are thousands of such files and each has the same data1, data2, data3, ..., dataX I'd like to count number of each value for each data and column from all files. Example:
In file 'data1-IN' (filename)
2.3 - 50 (times)
0.1 - 233 (times)
... - ... (times)
In file 'data1-OUT' (filename)
2.1 - 1024 (times)
2.8 - 120 (times)
... - ... (times)
In file 'data2-IN' (filename)
0.4 - 312 (times)
0.3 - 202 (times)
... - ... (times)
In file 'data2-OUT' (filename)
1.1 - 124 (times)
3.8 - 451 (times)
... - ... (times)
In file 'data3-IN' ...
Which Python data structure would be the best to count such data ? I wanted to use multidimensional dictionary but I am struggling with KeyErrors etc.
Upvotes: 0
Views: 310
Reputation: 3701
I have recently started using the Pandas data frame. It has a CSV reader and makes slicing and dicing data very simple.
Upvotes: 1
Reputation: 1121824
You really want to use collections.Counter
, perhaps contained in a collections.defaultdict
:
import collections
import csv
counts = collections.defaultdict(collections.Counter)
for filename in files:
for line in csv.reader(open(filename, 'rb')):
counts[filename + '-IN' ][line[1]] += 1
counts[filename + '-OUT'][line[2]] += 1
Upvotes: 3