CSV: Counting a string in a column if another column has a certain value

Question

I'm having a problem with a csv file where I need some information from. The following is what I need to do:

I have a CSV file that is ordered like this:

bla country bla bla    value 
    Germany               Y
    Germany               Y
    Germany               N 
    Denmark               N
    Denmark               N
    Denmark               Y

Now what I want to do with python is counting every time the Y value is in the same column. So in the end I get something like Germany:2 Denmark:1.

However I've only been able to figure out how to count the columns using the following code:

import csv
from collections import Counter, defaultdict
from itertools import imap
from operator import  itemgetter



header_counter = defaultdict(Counter)

with open('airlines.csv') as input_file:
  r = csv.reader(input_file, delimiter=',')

  headers = next(r)
  for row in r:

      row_val = sum([w.isdigit() for w in row])

      for header, val in zip(headers, row):

          if not any(map(str.isdigit, val)):
              header_counter[header].update({val: row_val})


for k, v in header_counter.iteritems():
   print k, v

I donut think the above code is of much use to anyone though as it only counts the rows per columns and filters out integers. Any help I can get is much appreciated I'm still fairly inexperienced.

totoro · Accepted Answer

I think @smarx's answer is the most beautiful way to do it. Here is a more verbose and pragmatical approach:

import csv

d = {}

with open('airlines.csv', 'r') as f:
    # Sniff the CSV dialect
    dialect = csv.Sniffer().sniff(f.read(1024))

    # Move back to beginning of file
    f.seek(0)

    # DictReader uses the first row in the file as headers.
    r = csv.DictReader(f, dialect=dialect)

    # Plain iteration and counting in a normal dict.
    for row in r:
        # Plain incrementation of the "country" by one if "value" is
        # 'Y'
        if row['value'] == 'Y':
            d[row['country']] = d.get(row['country'], 0) + 1

for k in d:
    print('{} => {}'.format(k, d[k]))

CSV: Counting a string in a column if another column has a certain value

Answers (2)

Related Questions