Reputation: 359
I have a sorted 2-dimensional list in which in the first column a specific value can occur multiple times, but with different corresponding values in the second column.
Example:
1 10
2 20
3 30
3 35
4 40
5 45
5 50
5 55
6 60
I'd like to average over those multiple entries, so that my final list looks like
1 10
2 20
3 32.5
4 40
5 50
6 60
One problem is, that you don't know how many times a value occurs. My code so far looks like
for i in range(len(list)):
print i
if i+1 < len(list):
if list[i][0] == list[i+1][0]:
j = 0
sum = 0
while list[i][0] == list[i+j][0]: #this while loop is there to account for the unknown number of multiple values
sum += list[i+j][1]
j += 1
avg = sum / j
#print avg
#i+=j # here I try to skip the next j steps in the for loop, but it doesn't work
#final[i].append(i)
#final[i].append(avg) # How do I append a tuple [i, avg] to the final list?
else:
final.append(list[i])
else:
final.append(list[i])
print final
My questions are:
Upvotes: 0
Views: 100
Reputation: 44112
Following code is using groupby
from itertools
:
lst = [[1, 10],
[2, 20],
[3, 30],
[3, 35],
[4, 40],
[5, 45],
[5, 50],
[5, 55],
[6, 60],
]
from itertools import groupby
avglst = []
for grpname, grpvalues in groupby(lst, lambda itm: itm[0]):
values = [itm[1] for itm in grpvalues]
avgval = float(sum(values)) / len(values)
avglst.append([grpname, avgval])
print(avglst)
When run:
$ python avglist.py (env: stack)
python[[1, 10.0], [2, 20.0], [3, 32.5], [4, 40.0], [5, 50.0], [6, 60.0]]
it provides the result you asked for.
Explanation:
groupby
gets iterable (the list) and a function, which calculates s called key, that is a value,
used for creating groups. In our case we are going to group according to first element in list item.
Note, that groupby
creates groups each time the key value changes, so be sure, your input list is
sorted, otherwise you get more groups than you expect.
The groupby
returns tuples (grpname, groupvalues)
where grpname
is the key value for given
group, and the groupvalues
is iterator over all items in that groups. Be careful, that it is not
list, to get list from it, something (like call to list(grpvalues)
) must iterate over the values.
In our case we iterate using list comprehension picking only 2nd item in each list element.
While iterators, generators and similar constructs in python might seem to be too complex at first, they serve excellently at the moment, one has to process very large lists and iterables. In such a case, Python iterators are holding in memory only current item so one can manage really huge or even endless iterables.
Upvotes: 2
Reputation: 27283
Here's how you can do it with a combination of Counter
and OrderedDict
:
from __future__ import division # Python 2
from collections import Counter, OrderedDict
counts, sums = OrderedDict(), Counter()
for left, right in [(1,10), (2,20), (3,30), (4,40), (5,45), (5,50), (5,55)]:
counts[left] = counts.get(left, 0) + 1
sums[left] += right
result = [(key, sums[key]/counts[key]) for key in counts]
Upvotes: 1
Reputation: 101701
First we need to group the columns together. We'll do this with a dictionary where the key is the left column and the value is a list of the values for that key. Then, we can do a simple calculation to get the averages.
from collections import defaultdict
data = [
(1, 10),
(2, 20),
(3, 30),
(3, 35),
(4, 40),
(5, 45),
(5, 50),
(5, 55),
(6, 60)
]
# Organize the data into a dict
d = defaultdict(list)
for key, value in data:
d[key].append(value)
# Calculate the averages
averages = dict()
for key in d:
averages[key] = sum(d[key]) / float(len(d[key]))
# Use the averages
print(averages)
Output:
{1: 10.0, 2: 20.0, 3: 32.5, 4: 40.0, 5: 50.0, 6: 60.0}
Upvotes: 1
Reputation: 2662
You can use a dictionary to count how many times each value in the left column occurs? And a separate dictionary to map the sum of elements associated with each left entry. And then with one final for loop, divide the sum by the count.
from collections import defaultdict
someList = [(1,10), (2,20), (3,30), (4,40), (5,45), (5,50), (5,55)]
count_dict = defaultdict(lambda:0)
sum_dict = defaultdict(lambda:0.0)
for left_val, right_val in someList:
count_dict[left_val] += 1
sum_dict[left_val] += right_val
for left_val in sorted(count_dict):
print left_val, sum_dict[left_val]/count_dict[left_val]
1 10.0
2 20.0
3 30.0
4 40.0
5 50.0
Upvotes: 1