Reputation: 1037
I have these numpy arrays:
array1 = np.array([-1, -1, 1, 1, 2, 1, 2, 2])
array2 = np.array([34.2, 11.2, 22.1, 78.2, 55.0, 66.87, 33.3, 11.56])
Now I want to return a 2d array in which there is the mean for each distinctive value from array1 so my output would look something like this:
array([[-1, 22.7],
[ 1, 55.7],
[ 2, 33.3]])
Is there an efficient way without concatenating those 1D arrays to one 2D array? Thanks!
Upvotes: 2
Views: 983
Reputation: 10759
This is a typical grouping operation, and the numpy_indexed package (disclaimer: I am its author) provides extensions to numpy to perform these type of operations efficiently and concisely:
import numpy_indexed as npi
groups, means = npi.group_by(array_1).mean(array_2)
Note that you can in this manner easily perform other kind of reductions as well, such as a median for example.
Upvotes: 3
Reputation: 221504
Here's an approach using np.unique
and np.bincount
-
# Get unique array1 elems, tag them starting from 0 and get their tag counts
unq,ids,count = np.unique(array1,return_inverse=True,return_counts=True)
# Use the tags/IDs to perform ID based summation of array2 elems and
# thus divide by the ID counts to get ID based average values
out = np.column_stack((unq,np.bincount(ids,array2)/count))
Sample run -
In [16]: array1 = np.array([-1, -1, 1, 1, 2, 1, 2, 2])
...: array2 = np.array([34.2, 11.2, 22.1, 78.2, 55.0, 66.87, 33.3, 11.56])
...:
In [18]: out
Out[18]:
array([[ -1. , 22.7 ],
[ 1. , 55.72333333],
[ 2. , 33.28666667]])
Upvotes: 1