cresendez744
cresendez744

Reputation: 91

How to find the average value from a list of tuples that share the same key?

I have two lists created from user inputs that I combined into a list of tuples with the following code:

daily_hours_list = [4, 2, 1, None, 3, 5]
week_counter_list = [1, 1, 1, 2, 2, 2]

weekly_hours_list = []
for week, time in zip(week_counter_list, daily_hours_list):
    if time != None:
        weekly_hours_list.append((week, t))

Which gives me:

weekly_hours_list = [(1, 4),
                     (1, 2),
                     (1, 1),
                     (2, 3),
                     (2, 5)]

I then used this code to sum all hours in week 1 and all hours in week 2:

tup_h = {i:0 for i, v in weekly_hours_list}
for key, value in weekly_hours_list:
    tup_h[key] = tup_h[key]+value
weekly_sum_hours = list(map(tuple, tup_h.items()))

Giving me:

weekly_sum_hours = [(1, 6),
                    (2, 8)]

This all works fine but how do I find the average hours for each week like:

weekly_average_list = [(1, 2),
                       (2, 4)]

I imagine I would need to expand on the for loop calculation, accounting for the count of tuples with week values of 1 and 2, but not sure how to implement that. Thanks for the help in advance.

Upvotes: 1

Views: 307

Answers (1)

Tim
Tim

Reputation: 2049

What I think would be helpful would be to first collect together the hours for each week. This could be easily done with a dictionary where the key is the week number, and the value is a list of hours for that week. There's a data structure called a defaultdict in the built-in collections module that is designed exactly for a situtation like this:

from collections import defaultdict
from statistics import mean

daily_hours_list = [4, 2, 1, None, 3, 5]
week_counter_list = [1, 1, 1, 2, 2, 2]

daily_hours_by_week = defaultdict(list)
for week, time in zip(week_counter_list, daily_hours_list):
    if time is not None:
        daily_hours_by_week[week].append(time)

sum_hours_by_week = {w: sum(hours) for w, hours in daily_hours_by_week.items()}
avg_hours_by_week = {w: mean(hours) for w, hours in daily_hours_by_week.items()}

In our example, this means you don't have to initialize the dictionary with an empty list for each week number (which is what you did with your initial sum of 0 for tup_h). Instead, if we ever try to append an hour to a week which isn't yet in the dictionary, it will create an empty list to put under that key and then append to that.

Once we've got our hours organised per-week like this, it's really easy to do other processing on them.

We could actually do the last two lines at once and create a single dictionary with a statistics-per-week tuple:

statistics_by_week = {w: sum(hours), mean(hours) for w, hours in daily_hours_by_week.items()}

Read more details on defaultdict here: https://docs.python.org/3/library/collections.html#collections.defaultdict

Upvotes: 4

Related Questions