janecs
janecs

Reputation: 3

Calculate percentage of a csv column in python

I have this csv file about logged hours by users that looks roughly like this, but it's much larger (more users and projects):

User,Project,Hours
User1,ProjectA,5
User1,ProjectB,10
User2,ProjectA,7
User2,ProjectB,12

I have some code done that for now prints total logged hours for all users. It also prints data from only one user, as well as a line with total hours for that user.

What I wanted now is to use the total hours for a user, to calculate the percentage of a project time on the total. For example, what is the percentage of ProjectA on User1 time? Can anyone help, I've been trying to figure this out but so far couldn't. I'm quite new to python, so any hints or help is really appreciated.

Thanks in advance!

Upvotes: 0

Views: 2027

Answers (1)

Katriel
Katriel

Reputation: 123662

import csv
import collections

with open(<...>) as data_file:
    total_hours = collections.defaultdict(int)
    for row in csv.DictReader(data_file):
        total_hours[row['User']] += int(row['Hours'])

Or you could just read the data into a dictionary user -> project -> time and use that:

import functools

with open(<...>) as data_file:
    data = collections.defaultdict(
        functools.partial(collections.defaultdict, int))
    for row in csv.DictReader(data_file):
        data[row['User']][row['Project']] += int(row['Hours'])

and then

total_hours = {user: sum(time.values()) for user, time in data}

Upvotes: 1

Related Questions