How to get the median of a CSV column using Python

Question

The header of my CSV file is as follows:

Last Name, First Name, Student No, A1, A2, A3, A4

Squarepants, Spongebob, 9911991199, 99, 88, 77, 66

Star, Patrick, 9912334456, 11, 22, 33, 44

Tentacles, Squidward, 9913243567, 78, 58, 68, 88

For my function, I want to return a list of the assignment (A) medians, in order. And the number of assignments can change, so the function should account for n amount of assignments.

Thank you

RoadRunner · Accepted Answer

What about collecting your items in a collections.defaultdict(), and then applying statistics.median() on each list of medians:

from csv import reader
from statistics import median
from collections import defaultdict

data = defaultdict(list)
with open('data.csv') as file:
    csv_reader = reader(file)
    headers = list(map(str.strip, next(csv_reader)))

    for line in csv_reader:
        for col, value in enumerate(map(str.strip, line)):
            data[headers[col]].append(value)

medians = {k: median(map(float, v)) for k, v in data.items() if k.startswith('A')}

print(medians)

Which outputs a dictionary of medians:

{'A1': 78.0, 'A2': 58.0, 'A3': 68.0, 'A4': 66.0}

UPDATE:

As requested, you may also get a list of medians like so:

print(list(medians.values()))
# [78.0, 58.0, 68.0, 66.0]

How to get the median of a CSV column using Python

Answers (1)

Related Questions