Reputation: 605
I have a csv file that has data as the following:
TaskId | Attr. 1 | Attr. 2 | Attr. 3
123 23 twothree xyx
123 23 four lor
456 23 four pop
123 23 twothree xyx
352 34 some lkj
I want to produce a dictionary (or even just print) with attribute and frequency based on the task id.
Expected output:
For task id 123,
23: 3 times
four: 1 times
twothree: 2 times
xyx: 2 times
lor: 1 time
I tried the following:
import csv
from collections import Counter
from itertools import imap
from operator import itemgetter
with open('task.csv') as f:
data = csv.reader(f)
for row in data:
if row[0] == '123':
cn = Counter(imap(itemgetter(2), row))
for t in cn.iteritems():
print("{} appears {} times".format(*t))
But it did not work. In
Counter(imap(itemgetter(2), row))
instead of row
and condition, I supplied data
and it showed a particular column's items frequency correctly. But I want it based on a condition. How can this be done?
Upvotes: 2
Views: 72
Reputation: 316
It may be quicker to use pandas:
import pandas as pd
df = pd.read_csv('task.csv') # open the file
df['count'] = 0 # add an extra column to count group value occurrences
counts = df.groupby(by = ['TaskId'], as_index = False, sort = False).count() # counts non blank values of the group
display(counts) # shows you the output
Upvotes: 0
Reputation: 1014
If you don't want to use Pandas, this can be done easily with a dictionary:
import csv
from tabulate import tabulate
uniquekeys = {}
with open('data') as f:
data = csv.reader(f)
next(data, None) # skip the headers
for row in data:
key = str(row[0]+":"+row[1])
uniquekeys[key] = uniquekeys.get(key, 0) + 1
print(uniquekeys)
Alternatively, This could be done easily without python too:
cat data |awk -F',' 'NR > 1{print $1":"$2}'|sort|uniq -c
Upvotes: 0
Reputation: 164693
You can use collections.defaultdict
to create a nested dictionary:
from io import StringIO
import csv
from collections import defaultdict
mystr = StringIO("""TaskId,Attr. 1,Attr. 2,Attr. 3
123,23,twothree,xyx
123,23,four,lor
456,23,four,pop
123,23,twothree,xyx
352,34,some,lkj""")
d = defaultdict(lambda: defaultdict(int))
# replace mystr with open('file.csv', 'r')
with mystr as fin:
for item in csv.DictReader(fin):
d[int(item['TaskId'])][int(item['Attr. 1'])] += 1
d[int(item['TaskId'])][item['Attr. 2']] += 1
d[int(item['TaskId'])][item['Attr. 3']] += 1
print(d)
defaultdict({123: defaultdict(int, {23: 3, 'twothree': 2, 'xyx': 2,
'four': 1, 'lor': 1}),
352: defaultdict(int, {34: 1, 'some': 1, 'lkj': 1}),
456: defaultdict(int, {23: 1, 'four': 1, 'pop': 1})})
Then iterate as you would a normal dictionary:
for k, v in d.items():
print('TaskId: {0}'.format(k))
for a, b in v.items():
print('{0}: {1} times'.format(a, b))
Result:
TaskId: 123
23: 3 times
twothree: 2 times
xyx: 2 times
four: 1 times
lor: 1 times
TaskId: 456
23: 1 times
four: 1 times
pop: 1 times
TaskId: 352
34: 1 times
some: 1 times
lkj: 1 times
Upvotes: 1