Reputation: 41
I have a .csv file with 3 columns, let' say a,b,c with c representing time and can have values from 00-24.
I want to go through this file and extract unique a,b,c and count the number of occurrences of a particular c. For example, if the file looks like this:
a1 b1 c1
a1 b1 c1
a1 b1 c1
a1 b1 c2
a1 b1 c2
a1 b2 c1
a1 b2 c1
a2 b1 c1
a2 b1 c2
I want to create something like this:
{a1:{b1:{c1:3, c2:2},b2:{c1:2}},a2:{b1:{c1:1,c2:1}}}
But I'm not sure if a nested dictionary is a good choice. In case it is, I have difficulty implementing the "counter" part.
Upvotes: 0
Views: 152
Reputation: 567
You can still use a Counter
to do the counting:
rows = [
('a1', 'b1', 'c1'),
('a1', 'b1', 'c1'),
('a1', 'b1', 'c1'),
('a1', 'b1', 'c2'),
('a1', 'b1', 'c2'),
('a1', 'b2', 'c1'),
('a1', 'b2', 'c1'),
('a2', 'b1', 'c1'),
('a2', 'b1', 'c2'),
]
from collections import Counter
counts = Counter(rows)
As far as changing the data structure to a nested dictionary, you can do this with a plain dictionary using setdefault
, or you can implement an "autovivificious" dictionary and use that:
class AutoViv(dict):
def __missing__(self, key):
value = self[key] = type(self)()
return value
nested = AutoViv()
for row, count in counts.iteritems():
nested[row[0]][row[1]][row[2]] = count
This matches your desired result:
>>> nested
{'a1': {'b1': {'c2': 2, 'c1': 3}, 'b2': {'c1': 2}}, 'a2': {'b1': {'c2': 1, 'c1': 1}}}
Upvotes: 1