Reputation: 24121
I have an m-by-n NumPy array A
, where each row represents an observation of some data. My rows are also assigned to one of c classes, and the class for each row is stored in an m-by-1 NumPy array B
. I now want to compute the mean observation data M
for each class. How can I do this?
For example:
A = numpy.array([[1, 2, 3], [1, 2, 3], [3, 4, 5], [4, 5, 6]])
B = numpy.array([1, 0, 0, 1]) # the first row is class 1, the second row is class 0 ...
M = # Do something
This should give me the output:
>>M
numpy.array([[2, 3, 4], [2.5, 3.5, 4.5]])
Here, row i
in M
is the mean for class i
.
Upvotes: 1
Views: 337
Reputation: 10759
This is a typical grouping problem, which can be solved in a single line using the numpy_indexed package (disclaimer: I am its author):
import numpy_indexed as npi
npi.group_by(B).mean(A)
Upvotes: 0
Reputation: 19547
Another way to do this using numpy's new at
functionality.
A = numpy.array([[1, 2, 3], [1, 2, 3], [3, 4, 5], [4, 5, 6]])
B = numpy.array([1, 0, 0, 1])
u, uinds = numpy.unique(B, return_inverse=True)
M = numpy.zeros((u.shape[0], A.shape[-1]))
numpy.add.at(M, B, A)
M /= numpy.bincount(uinds)[:, None]
M
array([[ 2. , 3. , 4. ],
[ 2.5, 3.5, 4.5]])
As mentioned pandas would make this easier:
import pandas as pd
>>> pd.DataFrame(A).groupby(B).mean()
0 1 2
0 2.0 3.0 4.0
1 2.5 3.5 4.5
Upvotes: 2
Reputation: 14377
As mentioned in a comment, depending on where you want to go with this, pandas may be more useful. But right now this is still possible with numpy
import numpy
A = numpy.array([[1, 2, 3], [1, 2, 3], [3, 4, 5], [4, 5, 6]])
B = numpy.array([1, 0, 0, 1])
class_indicators = B[:, numpy.newaxis] == numpy.unique(B)
mean_operator = numpy.linalg.pinv(class_indicators.astype(float))
means = mean_operator.dot(A)
This example works for many classes etc, but as you see, this may be cumbersome
Upvotes: 3