Mauro Gentile
Mauro Gentile

Reputation: 1511

Normalize slices of a ndarray

I have a 3 columns array. The first column of the array have values between 1 and 10. I need to extract all lines where the first column is 1 and normalize the third column of this slice of array. Then repeat the same thing for all rows for which the first column is equal to 2 etc.

If I run this code, it leaves the array unchanged:

for u in np.unique(x[:,0]):
    mask= x[:, 0] == u
    x[mask][:,2]=x[mask][:,2]/np.sum((x[mask][:,2]))

If I run this other slice of code, I see that r (I placed a print r in the loop) actually work exactly as I want. The only point is that the original array x unchanged.

for u in np.unique(x[:,0]):
    r = x[x[:, 0] == u]
    r[:,2]=r[:,2]/np.sum((x[x[:,0]==u][:,2]))

Why is that? What am I doing wrong???

Upvotes: 1

Views: 264

Answers (2)

Divakar
Divakar

Reputation: 221594

Here's an alternative vectorized approach with performance in mind to solve your problem using np.unique and np.bincount -

tags = np.unique(x[:,0], return_inverse=1)[1]
x[:,2] /= np.bincount(tags, x[:,2])[tags]

To further boost the performance, one can avoid the use of np.unique and directly compute the equivalent of np.bincount(tags, xc[:,2]), while making use of the fact that the numbers in the first column are between 1 and 10, with this -

np.bincount(xc[:,0].astype(int), xc[:,2], minlength=11)[1:]

To replace tags, we could use the first column, like so -

tags = xc[:,0].astype(int)-1

Upvotes: 1

sietschie
sietschie

Reputation: 7553

Don't index twice. Apparently then a copy of the source array is created. Use x[mask,2] instead of x[mask][:,2]:

for u in np.unique(x[:,0]):
    mask= x[:, 0] == u
    x[mask,2]=x[mask,2]/np.sum((x[mask,2]))

Upvotes: 1

Related Questions