Qanthelas
Qanthelas

Reputation: 524

Counting Based on Occurrences of a Dictionary's Key in a .CSV File

I have a dict that looks like this:

{100002: "['Apple', 'M', 'R', '500001', 'Fruit', '90']", 100004: "['Banana', 'M', 'Y', '500001', 'Fruit', '75']"}

The keys are integers and the values are strings.

I have a .csv file that looks like this:

100001,1
100001,1
100001,2
100002,1
100002,1
100002,3
100002,3
100003,1
100003,4
100004,2
100004,3
100004,3

I want to count the number of occurrences for each number in the second column for a given key and add that count to my dict. So with this sample 100001 would have a count of 2 for 1 and 1 for 2, 100002 would have a count of 2 for 1 and 2 for 3, 100003 would have a count of 1 for 1 and 1 for 4, and 100004 would have a count of 1 for 2 and 2 for 3. While this .csv file has data for a wide range of keys (of which the keys in my dict are a subset) I want to append these counts to my dict so that it might look like this (with 4 new values added for each key, one each for the counts of the numbers 1-4 in order.)

{100002: "['Apple', 'M', 'R', '500001', 'Fruit', '90', '2', '0', '2', '0']", 100004: "['Banana', 'M', 'Y', '500001', 'Fruit', '75', '0', '1', '2', '0']"}

These 4 added pieces are the counts of the numbers 1-4 in order so 100002 has '2', '0', '2', '0' because in the .csv file there are 2 lines of 100002,1 but 0 lines with 100002,2 and 2 lines with 100002,3 but 0 lines with 100002,4.

My question has two parts. 1) How can I count up the times when a key is followed by a number from 1-4 in a .csv file so that I have 4 counts (one each for the numbers 1-4)? 2) How can I add these counts into my dictionary?

ANSWER:

Based on the accepted answer I crafted this. It's a bit uglier than I'd like, but I managed to get the job done with it.

dd = defaultdict(lambda: defaultdict(int))
with open('AgentsCorpLevel.csv') as fin:
    csvin = csv.reader(fin)
    for row in csvin:
        if int(row[0]) in MyDict.keys():
            dd[int(row[0])][row[1]] += 1
print dd
dicts = MyDict,dd
#print dicts
FullDict = {}
PartlyCleanedDict = {}
CleanedDict = {}
TwoTypeDict = {k:[d.get(k) for d in dicts] for k in {k for d in dicts for k in d}}
for key, value in TwoTypeDict.iteritems():
    FullDict.setdefault((int(key)), str(value))
for key, value in FullDict.iteritems():
    PartlyCleanedDict.setdefault((int(key)), value.translate(None, "[]{()\/\'\"<>").replace('}',',}'))
for key, value in PartlyCleanedDict.iteritems():
    CleanedDict.setdefault((int(key)), value.replace(',defaultdicttype int', ''))
print CleanedDict

The print of dd looks like this

defaultdict(<function <lambda> at 0x00000000025C3518>, {1000164: defaultdict(<ty
pe 'int'>, {'1': 12, '3': 5, '2': 17, '4': 10}), 1000103: defaultdict(<type 'int
'>, {'1': 3, '3': 3, '2': 3, '4': 3}), 1000137: defaultdict(<type 'int'>, {'1':
5, '3': 4, '2': 7, '4': 1}), 1000140: defaultdict(<type 'int'>, {'1': 28, '3': 2
6, '2': 33, '4': 8}), 1000143: defaultdict(<type 'int'>, {'1': 1, '3': 3, '2': 1
, '4': 1}), 1000149: defaultdict(<type 'int'>, {'1': 6, '3': 7, '2': 9, '4': 6})
, 1000150: defaultdict(<type 'int'>, {'1': 13, '3': 11, '2': 22, '4': 12}), 1000
132: defaultdict(<type 'int'>, {'1': 2, '3': 4, '2': 4, '4': 1}), 1000155: defau
ltdict(<type 'int'>, {'1': 10, '3': 4, '2': 2, '4': 3}), 1000158: defaultdict(<t
ype 'int'>, {'1': 6, '3': 1, '2': 7, '4': 5})})

Unfortunately my attempts to fully 'clean' the resulting CleanedDict didn't work because here is an example of what the print of CleanedDict looks like (note that I'm only giving 3 keys here and I've changed the names to fit the fruit and vegetables theme I had going with my sample.

{1000132: 'Kiwi, S, B, 500006, Fruit, 3n, defaultdicttype int, 1: 2, 3: 4, 2: 4, 4: 1,}', 1000103: 'Iceberg Lettuce, M, G, 500004, Vegetable, 2n, defaultdicttype int, 1: 3, 3: 3, 2: 3, 4: 3,}',1000137: 'Pineapple, M, Y, 500006, Fruit, 45n,defaultdicttype int, 1: 5, 3: 4, 2: 7, 4: 1,}'}

Upvotes: 1

Views: 2204

Answers (1)

Jon Clements
Jon Clements

Reputation: 142166

You could use a nested defaultdict - I'll leave the fine tuning and handling of more than 4 values and the exact formatting etc... to you...

import csv
from collections import defaultdict

d = {100002: "['Apple', 'M', 'R', '500001', 'Fruit', '90']", 100004: "['Banana', 'M', 'Y', '500001', 'Fruit', '75']"}
dd = defaultdict(lambda: defaultdict(int))
with open('test.csv') as fin:
    csvin = csv.reader(fin)
    for row in csvin:
        dd[int(row[0])][row[1]] += 1

for key in (key for key in dd if key in d):
    counts = [0] * 4
    for idx, val in dd[key].iteritems():
        counts[int(idx) - 1] = int(val)
    print key, d[key], counts

# 100002 ['Apple', 'M', 'R', '500001', 'Fruit', '90'] [2, 0, 2, 0]
# 100004 ['Banana', 'M', 'Y', '500001', 'Fruit', '75'] [0, 1, 2, 0]

Upvotes: 1

Related Questions