Reputation: 359

average over multiple entries in a sorted list

I have a sorted 2-dimensional list in which in the first column a specific value can occur multiple times, but with different corresponding values in the second column.

Example:

I'd like to average over those multiple entries, so that my final list looks like

One problem is, that you don't know how many times a value occurs. My code so far looks like

for i in range(len(list)):
    print i
    if i+1 < len(list):
        if list[i][0] == list[i+1][0]:
            j = 0
            sum = 0
            while list[i][0] == list[i+j][0]:     #this while loop is there to account for the unknown number of multiple values
                sum += list[i+j][1]
                j += 1
            avg = sum / j
            #print avg
            #i+=j                                 # here I try to skip the next j steps in the for loop, but it doesn't work
            #final[i].append(i)
            #final[i].append(avg)                 # How do I append a tuple [i, avg] to the final list?
        else:
            final.append(list[i])
    else:
        final.append(list[i])
print final

My questions are:

How do I properly account for the multiple entries and don't count them twice with the for loop?
How do I append a tuple [i, avg] to the final list?

Upvotes: 0

Answers (4)

Jan Vlcinsky

Reputation: 44112

Following code is using groupby from itertools:

lst = [[1, 10],
       [2, 20],
       [3, 30],
       [3, 35],
       [4, 40],
       [5, 45],
       [5, 50],
       [5, 55],
       [6, 60],
       ]
from itertools import groupby

avglst = []
for grpname, grpvalues in groupby(lst, lambda itm: itm[0]):
    values = [itm[1] for itm in grpvalues]
    avgval = float(sum(values)) / len(values)
    avglst.append([grpname, avgval])
print(avglst)

When run:

$ python avglist.py                                                                    (env: stack)
python[[1, 10.0], [2, 20.0], [3, 32.5], [4, 40.0], [5, 50.0], [6, 60.0]]

it provides the result you asked for.

Explanation:

groupby gets iterable (the list) and a function, which calculates s called key, that is a value, used for creating groups. In our case we are going to group according to first element in list item.

Note, that groupby creates groups each time the key value changes, so be sure, your input list is sorted, otherwise you get more groups than you expect.

The groupby returns tuples (grpname, groupvalues) where grpname is the key value for given group, and the groupvalues is iterator over all items in that groups. Be careful, that it is not list, to get list from it, something (like call to list(grpvalues)) must iterate over the values. In our case we iterate using list comprehension picking only 2nd item in each list element.

While iterators, generators and similar constructs in python might seem to be too complex at first, they serve excellently at the moment, one has to process very large lists and iterables. In such a case, Python iterators are holding in memory only current item so one can manage really huge or even endless iterables.

Upvotes: 2

L3viathan

Reputation: 27283

Here's how you can do it with a combination of Counter and OrderedDict:

from __future__ import division  # Python 2
from collections import Counter, OrderedDict
counts, sums = OrderedDict(), Counter()
for left, right in [(1,10), (2,20), (3,30), (4,40), (5,45), (5,50), (5,55)]:
    counts[left] = counts.get(left, 0) + 1
    sums[left] += right

result = [(key, sums[key]/counts[key]) for key in counts]

Upvotes: 1

Evan Fosmark

Reputation: 101701

First we need to group the columns together. We'll do this with a dictionary where the key is the left column and the value is a list of the values for that key. Then, we can do a simple calculation to get the averages.

from  collections import defaultdict

data = [
    (1, 10),
    (2, 20),
    (3, 30),
    (3, 35),
    (4, 40),
    (5, 45),
    (5, 50),
    (5, 55),
    (6, 60)
]

# Organize the data into a dict
d = defaultdict(list)
for key, value in data:
    d[key].append(value)

# Calculate the averages
averages = dict()
for key in d:
    averages[key] = sum(d[key]) / float(len(d[key]))

# Use the averages
print(averages)

Output:

{1: 10.0, 2: 20.0, 3: 32.5, 4: 40.0, 5: 50.0, 6: 60.0}

Upvotes: 1

Garrett R

Reputation: 2662

You can use a dictionary to count how many times each value in the left column occurs? And a separate dictionary to map the sum of elements associated with each left entry. And then with one final for loop, divide the sum by the count.

from collections import defaultdict
someList = [(1,10), (2,20), (3,30), (4,40), (5,45), (5,50), (5,55)]
count_dict = defaultdict(lambda:0)
sum_dict = defaultdict(lambda:0.0)
for left_val, right_val in someList:
    count_dict[left_val] += 1
    sum_dict[left_val] += right_val

for left_val in sorted(count_dict):
    print left_val, sum_dict[left_val]/count_dict[left_val]

Output

Upvotes: 1

average over multiple entries in a sorted list

Answers (4)

Output

Related Questions