Reputation: 2253
I've a numpy array with shape N,2 and N>10000. I the first column I have e.g. 6 class values (e.g. 0.0,0.2,0.4,0.6,0.8,1.0) in the second column I have float values. Now I want to calculate the average of the second column for all different classes of the first column resulting in 6 averages one for each class.
Is there a numpy way to do this, to avoid manual loops especially if N is very large?
Upvotes: 1
Views: 294
Reputation: 2253
I copied the answer from Warren to here, since it solves my problem best and I want to check it as solved:
This is a "groupby/aggregation" operation. The question is this close to being a duplicate of getting median of particular rows of array based on index. ... You could also use scipy.ndimage.labeled_comprehension as suggested there, but you would have to convert the first column to integers (e.g. idx = (5*data[:, 0]).astype(int)
I did exactly this.
Upvotes: 0
Reputation: 67427
In pure numpy you would do something like:
unq, idx, cnt = np.unique(arr[:, 0], return_inverse=True,
return_counts=True)
avg = np.bincount(idx, weights=arr[:, 1]) / cnt
Upvotes: 3