Counting attribute values from CSV?

Question

I want to create a function that returns the count of each attribute value from a csv, the output should be a dictionary (for every attribute one) where the keys are the distinct attribute values and the associated values are the number of times that value occurs in the data...

for example I have the following CSV-File (the first line is the header):

First_Name,Last_Name,Age
Johnny,Got,22
Michael,Jackson,22
Johnny,Jackson,50
Andrea,Got,12

and I would wish to have that as output then,

for first name: {'Johnny': 2, 'Michael': 1, 'Andrea': 1}
for the second name: {'Jackson': 2, 'Got': 2}
and for the age: {22: 2, 50: 1, 12: 1}

I think I could do it with using the Counter class from the python collections module when I use the DictReader type for the CSV so that each row is a dictionary as well. But I still can't bring it to work, does anyone have an idea if that is possible? Here what I tried until now. :)

import csv
import os
import collections

FIRSTNAME_ATT = 'First_Name'
LASTNAME_ATT = 'Last_Name'
AGE_ATT = 'Age'


def count_attributes(file_name):
    firstname_counts = {}
    lastname_counts = {}
    age_counts = {}

    with open(file_name, encoding='utf-8') as csv_file:
        reader = csv.DictReader(csv_file)
        for row in reader:
            for i, val in enumerate(row):
                count_number[i][val] += 1
# Here I don't get any further :(
    return firstname_counts, lastname_counts, age_counts


if __name__ == '__main__':
    data_file = os.path.join("..", "data", "thecsvfile.csv")
    firstname_counts, lastname_counts, age_counts = attribute_counts(data_file)
    print(firstname_counts)
    print(lastname_counts)
    print(age_counts)

Would be great if anyone has an hint or an idea how to solve that. :)

martineau · Accepted Answer

In addition to collections.Counter, you could use a collections.OrderedDict to keep things simple plus make the processing largely "date-driven" in the sense that the contents of the csv file itself will determine what the attributes are (instead of hardcoding their names).

The use of OrderedDict preserves the order of the attributes in the csv file's header row.

Here's what I'm saying:

import os
import csv
from collections import Counter, OrderedDict

def count_attributes(file_name):
    with open(file_name, encoding='utf-8', newline='') as csv_file:
        reader = csv.DictReader(csv_file)
        counters = OrderedDict((attr, Counter()) for attr in reader.fieldnames)
        for row in reader:
            for attr, value in row.items():
                counters[attr][value] += 1

    return counters

if __name__ == '__main__':
#    data_file = os.path.join("..", "data", "thecsvfile.csv")
    data_file = "thecsvfile.csv"  # Slight simplification for testing.
    for attr, counts in count_attributes(data_file).items():
        print('{}: {}'.format(attr.replace('_', ' '), dict(counts)))

Output:

First Name: {'Johnny': 2, 'Michael': 1, 'Andrea': 1}
Last Name: {'Got': 2, 'Jackson': 2}
Age: {'22': 2, '50': 1, '12': 1}

Counting attribute values from CSV?

Answers (2)

Related Questions