Reputation: 15
I want to create a function that returns the count of each attribute value from a csv, the output should be a dictionary (for every attribute one) where the keys are the distinct attribute values and the associated values are the number of times that value occurs in the data...
for example I have the following CSV-File (the first line is the header):
First_Name,Last_Name,Age
Johnny,Got,22
Michael,Jackson,22
Johnny,Jackson,50
Andrea,Got,12
and I would wish to have that as output then,
for first name: {'Johnny': 2, 'Michael': 1, 'Andrea': 1}
for the second name: {'Jackson': 2, 'Got': 2}
and for the age: {22: 2, 50: 1, 12: 1}
I think I could do it with using the Counter
class from the python collections
module when I use the DictReader
type for the CSV so that each row is a dictionary as well. But I still can't bring it to work, does anyone have an idea if that is possible? Here what I tried until now. :)
import csv
import os
import collections
FIRSTNAME_ATT = 'First_Name'
LASTNAME_ATT = 'Last_Name'
AGE_ATT = 'Age'
def count_attributes(file_name):
firstname_counts = {}
lastname_counts = {}
age_counts = {}
with open(file_name, encoding='utf-8') as csv_file:
reader = csv.DictReader(csv_file)
for row in reader:
for i, val in enumerate(row):
count_number[i][val] += 1
# Here I don't get any further :(
return firstname_counts, lastname_counts, age_counts
if __name__ == '__main__':
data_file = os.path.join("..", "data", "thecsvfile.csv")
firstname_counts, lastname_counts, age_counts = attribute_counts(data_file)
print(firstname_counts)
print(lastname_counts)
print(age_counts)
Would be great if anyone has an hint or an idea how to solve that. :)
Upvotes: 1
Views: 2030
Reputation: 123473
In addition to collections.Counter
, you could use a collections.OrderedDict
to keep things simple plus make the processing largely "date-driven" in the sense that the contents of the csv file itself will determine what the attributes are (instead of hardcoding their names).
The use of OrderedDict
preserves the order of the attributes in the csv file's header row.
Here's what I'm saying:
import os
import csv
from collections import Counter, OrderedDict
def count_attributes(file_name):
with open(file_name, encoding='utf-8', newline='') as csv_file:
reader = csv.DictReader(csv_file)
counters = OrderedDict((attr, Counter()) for attr in reader.fieldnames)
for row in reader:
for attr, value in row.items():
counters[attr][value] += 1
return counters
if __name__ == '__main__':
# data_file = os.path.join("..", "data", "thecsvfile.csv")
data_file = "thecsvfile.csv" # Slight simplification for testing.
for attr, counts in count_attributes(data_file).items():
print('{}: {}'.format(attr.replace('_', ' '), dict(counts)))
Output:
First Name: {'Johnny': 2, 'Michael': 1, 'Andrea': 1}
Last Name: {'Got': 2, 'Jackson': 2}
Age: {'22': 2, '50': 1, '12': 1}
Upvotes: 0
Reputation: 3664
Solution:
firstname_counts = {}
lastname_counts = {}
age_counts = {}
with open(file_name, encoding='utf-8') as csv_file:
reader = csv.DictReader(csv_file)
for row in reader:
firstname_counts[row['First_Name']] = firstname_counts.get(row['First_Name'], 0) + 1
lastname_counts[row['Last_Name']] = lastname_counts.get(row['Last_Name'], 0) + 1
# similar for age...
You just need to check if the key in the dictionaries exist, if it does, add value 1 or get 0 when it does not exist and add 1. .get
method
in dictionary solves it.
Ref: dict .get method
EDIT:
Solution 2 (Using collections.Counter
):
from collections import Counter
firstname_counts = Counter()
lastname_counts = Counter()
age_counts = Counter()
# same code as in the above solution.
Upvotes: 1