Michael Hecht
Michael Hecht

Reputation: 2253

aggregate values of one colum by classes in second column using numpy

I've a numpy array with shape N,2 and N>10000. I the first column I have e.g. 6 class values (e.g. 0.0,0.2,0.4,0.6,0.8,1.0) in the second column I have float values. Now I want to calculate the average of the second column for all different classes of the first column resulting in 6 averages one for each class.

Is there a numpy way to do this, to avoid manual loops especially if N is very large?

Upvotes: 1

Views: 294

Answers (2)

Michael Hecht
Michael Hecht

Reputation: 2253

I copied the answer from Warren to here, since it solves my problem best and I want to check it as solved:

This is a "groupby/aggregation" operation. The question is this close to being a duplicate of getting median of particular rows of array based on index. ... You could also use scipy.ndimage.labeled_comprehension as suggested there, but you would have to convert the first column to integers (e.g. idx = (5*data[:, 0]).astype(int)

I did exactly this.

Upvotes: 0

Jaime
Jaime

Reputation: 67427

In pure numpy you would do something like:

unq, idx, cnt = np.unique(arr[:, 0], return_inverse=True,
                          return_counts=True)
avg = np.bincount(idx, weights=arr[:, 1]) / cnt

Upvotes: 3

Related Questions