Reputation: 391
I'm having a problem with a csv file where I need some information from. The following is what I need to do:
I have a CSV file that is ordered like this:
bla country bla bla value
Germany Y
Germany Y
Germany N
Denmark N
Denmark N
Denmark Y
Now what I want to do with python is counting every time the Y value is in the same column. So in the end I get something like Germany:2 Denmark:1.
However I've only been able to figure out how to count the columns using the following code:
import csv
from collections import Counter, defaultdict
from itertools import imap
from operator import itemgetter
header_counter = defaultdict(Counter)
with open('airlines.csv') as input_file:
r = csv.reader(input_file, delimiter=',')
headers = next(r)
for row in r:
row_val = sum([w.isdigit() for w in row])
for header, val in zip(headers, row):
if not any(map(str.isdigit, val)):
header_counter[header].update({val: row_val})
for k, v in header_counter.iteritems():
print k, v
I donut think the above code is of much use to anyone though as it only counts the rows per columns and filters out integers. Any help I can get is much appreciated I'm still fairly inexperienced.
Upvotes: 0
Views: 777
Reputation: 2456
I think @smarx's answer is the most beautiful way to do it. Here is a more verbose and pragmatical approach:
import csv
d = {}
with open('airlines.csv', 'r') as f:
# Sniff the CSV dialect
dialect = csv.Sniffer().sniff(f.read(1024))
# Move back to beginning of file
f.seek(0)
# DictReader uses the first row in the file as headers.
r = csv.DictReader(f, dialect=dialect)
# Plain iteration and counting in a normal dict.
for row in r:
# Plain incrementation of the "country" by one if "value" is
# 'Y'
if row['value'] == 'Y':
d[row['country']] = d.get(row['country'], 0) + 1
for k in d:
print('{} => {}'.format(k, d[k]))
Upvotes: 0
Reputation: 60143
Is this what you're looking for?
import csv
from collections import Counter
data = '''country,value
Germany,Y
Germany,Y
Germany,N
Denmark,N
Denmark,N
Denmark,Y'''
r = csv.DictReader(data.split('\n'))
counter = Counter(
row.get('country')
for row in r
if row.get('value') == 'Y')
for k, v in counter.items():
print('{}: {}'.format(k, v))
Upvotes: 1