Reputation: 4656
I have two numpy arrays, the first one is the values
and the second one is the indexes
. What I want to do is to get the average of the values
array based on the indexes
array.
For example:
values = [1,2,3,4,5]
indexes = [0,0,1,1,2]
get_indexed_avg(values, indexes)
# should give me
# [1.5, 3.5, 5]
Here, the values in the indexes
array represent the indexes in the final array. Hence:
values
array are being averaged to form the zero index in the final array.values
array are being averaged to form the first index in the final array.I do have a python solution to this. But that is just horrible and very slow. Is there a better solution to this? maybe using numpy? or other such libraries.
Upvotes: 0
Views: 2871
Reputation: 11
The simplest and easy solution:
values = np.array([1,2,3,4,5])
indexes = np.array([0,0,1,1,2])
index_set = set(indexes) # index_set = {0, 1, 2}
# Now get values based on the index that we saved in index_set
# and then take an average
avg = [np.mean(values[indexes==k]) for k in index_set]
print(avg) # [1.5, 3.5, 5.0]
Upvotes: 1
Reputation: 4656
I wanted to avoid pandas so I spent quite some time figuring it out. The way to do this is by using what's called a one-hot encoding.
Creating a one-hot encoding of the indexes will give us a 2-d array with 1s at places where we want them. For example:
indexes = np.array([0,0,1,1,2])
# one_hot = array(
# [[1., 0., 0.],
# [1., 0., 0.],
# [0., 1., 0.],
# [0., 1., 0.],
# [0., 0., 1.]]
# )
We just need to get a one-hot for the index array and mat-multiply it with the values to get what we want. Uses answer from this post
values = np.array([1,2,3,4,5])
indexes = np.array([0,0,1,1,2])
one_hot = np.eye(np.max(indexes) + 1)[indexes]
counts = np.sum(one_hot, axis=0)
average = np.sum((one_hot.T * values), axis=1) / counts
print(average) # [1.5 3.5 5.]
Upvotes: 1
Reputation: 32558
import pandas as pd
pd.Series(values).groupby(indexes).mean()
# OR
# pd.Series(values).groupby(indexes).mean().to_list()
# 0 1.5
# 1 3.5
# 2 5.0
# dtype: float64
Upvotes: 1