Underwood
Underwood

Reputation: 3

How to get the median of a CSV column using Python

The header of my CSV file is as follows:

Last Name, First Name, Student No, A1, A2, A3, A4

Squarepants, Spongebob, 9911991199, 99, 88, 77, 66

Star, Patrick, 9912334456, 11, 22, 33, 44

Tentacles, Squidward, 9913243567, 78, 58, 68, 88

For my function, I want to return a list of the assignment (A) medians, in order. And the number of assignments can change, so the function should account for n amount of assignments.

Thank you

Upvotes: 0

Views: 406

Answers (1)

RoadRunner
RoadRunner

Reputation: 26335

What about collecting your items in a collections.defaultdict(), and then applying statistics.median() on each list of medians:

from csv import reader
from statistics import median
from collections import defaultdict

data = defaultdict(list)
with open('data.csv') as file:
    csv_reader = reader(file)
    headers = list(map(str.strip, next(csv_reader)))

    for line in csv_reader:
        for col, value in enumerate(map(str.strip, line)):
            data[headers[col]].append(value)

medians = {k: median(map(float, v)) for k, v in data.items() if k.startswith('A')}

print(medians)

Which outputs a dictionary of medians:

{'A1': 78.0, 'A2': 58.0, 'A3': 68.0, 'A4': 66.0}

UPDATE:

As requested, you may also get a list of medians like so:

print(list(medians.values()))
# [78.0, 58.0, 68.0, 66.0]

Upvotes: 1

Related Questions