pb100
pb100

Reputation: 786

python - how to count number of numbers from columns?

I have a file with the following input data:

       IN   OUT
data1  2.3  1.3
data2  0.1  2.1
data3  1.5  2.8
dataX  ...  ...

There are thousands of such files and each has the same data1, data2, data3, ..., dataX I'd like to count number of each value for each data and column from all files. Example:

In file 'data1-IN' (filename)

2.3 - 50    (times)
0.1 - 233   (times)
... - ...   (times)

In file 'data1-OUT' (filename)

2.1 - 1024 (times)
2.8 - 120  (times)
... - ...  (times)

In file 'data2-IN' (filename)

0.4 - 312    (times)
0.3 - 202   (times)
... - ...   (times)

In file 'data2-OUT' (filename)

1.1 - 124 (times)
3.8 - 451  (times)
... - ...  (times)

In file 'data3-IN' ...

Which Python data structure would be the best to count such data ? I wanted to use multidimensional dictionary but I am struggling with KeyErrors etc.

Upvotes: 0

Views: 310

Answers (2)

Tooblippe
Tooblippe

Reputation: 3701

I have recently started using the Pandas data frame. It has a CSV reader and makes slicing and dicing data very simple.

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1121824

You really want to use collections.Counter, perhaps contained in a collections.defaultdict:

import collections
import csv

counts = collections.defaultdict(collections.Counter)

for filename in files:
    for line in csv.reader(open(filename, 'rb')):
         counts[filename + '-IN' ][line[1]] += 1
         counts[filename + '-OUT'][line[2]] += 1

Upvotes: 3

Related Questions