Reputation: 2365
I have a list of tuples (x,y) like:
l = [(2,1), (4,6), (3,1), (2,7), (7,10)]
Now I want to make a new list:
l = [(2.5,1), (4,6), (2,7), (7,10)]
with the new list having the average of the first value (x) of tuples if there are more than one tuple with the same second value (y) in the tuple.
Here since for (x,y) = (2,1) and (3,1) the second element in the tuple y=1 is common therefore the average of x=2 and 3 is in the new list. y=1 does not occur anywhere else, therefore the other tuples remain unchanged.
Upvotes: 4
Views: 1897
Reputation: 150785
Since you tagged pandas
:
l = [(2,1), (4,6), (3,1), (2,7), (7,10)]
df = pd.DataFrame(l)
Then df
is a data frame with two columns:
0 1
0 2 1
1 4 6
2 3 1
3 2 7
4 7 10
Now you want to compute the average of the numbers in column 0
with the same value in column 1
:
(df.groupby(1).mean() # compute mean on each group
.reset_index()[[0,1]] # restore the column order
.values # return the underlying numpy array
)
Output:
array([[ 2.5, 1. ],
[ 4. , 6. ],
[ 2. , 7. ],
[ 7. , 10. ]])
Upvotes: 2
Reputation: 53089
Here is a method using numpy.bincount
. It relies on the labels being nonnegative integers. (If this is not the case one can do np.unique(i, return_inverse=True)
first).
w,i = zip(*l)
n,d = np.bincount(i,w), np.bincount(i)
v, = np.where(d)
[*zip(n[v]/d[v],v)]
# [(2.5, 1), (4.0, 6), (2.0, 7), (7.0, 10)]
Upvotes: 0
Reputation: 1514
Another way using groupby
:
from itertools import groupby
# Sort list by the second element
sorted_list = sorted(l,key=lambda x:x[1])
# Group by second element
grouped_list = groupby(sorted_list, key=lambda x:x[1])
result = []
for _,group in grouped_list:
x,y = list(zip(*group))
# Take the mean of the first elements
result.append((sum(x) / len(x),y[0]))
You get:
[(2.5, 1), (4.0, 6), (2.0, 7), (7.0, 10)]
Upvotes: 0
Reputation: 2882
First form a hashtable/dict of all the second elements as key and their corresponding value as a list of values. Then with a listcomp you can get the desired output by iterating over the items of the dict.
from collections import defaultdict
out = defaultdict(list)
for i in l:
out[i[1]] += [i[0]]
out = [(sum(v)/len(v), k) for k, v in out.items()]
print(out)
#prints [(2.5, 1), (4.0, 6), (2.0, 7), (7.0, 10)]
Upvotes: 0