Reputation: 1511
I have a 3 columns array. The first column of the array have values between 1 and 10. I need to extract all lines where the first column is 1 and normalize the third column of this slice of array. Then repeat the same thing for all rows for which the first column is equal to 2 etc.
If I run this code, it leaves the array unchanged:
for u in np.unique(x[:,0]):
mask= x[:, 0] == u
x[mask][:,2]=x[mask][:,2]/np.sum((x[mask][:,2]))
If I run this other slice of code, I see that r (I placed a print r in the loop) actually work exactly as I want. The only point is that the original array x unchanged.
for u in np.unique(x[:,0]):
r = x[x[:, 0] == u]
r[:,2]=r[:,2]/np.sum((x[x[:,0]==u][:,2]))
Why is that? What am I doing wrong???
Upvotes: 1
Views: 264
Reputation: 221594
Here's an alternative vectorized approach with performance in mind to solve your problem using np.unique
and np.bincount
-
tags = np.unique(x[:,0], return_inverse=1)[1]
x[:,2] /= np.bincount(tags, x[:,2])[tags]
To further boost the performance, one can avoid the use of np.unique
and directly compute the equivalent of np.bincount(tags, xc[:,2])
, while making use of the fact that the numbers in the first column are between 1
and 10
, with this -
np.bincount(xc[:,0].astype(int), xc[:,2], minlength=11)[1:]
To replace tags
, we could use the first column, like so -
tags = xc[:,0].astype(int)-1
Upvotes: 1
Reputation: 7553
Don't index twice. Apparently then a copy of the source array is created.
Use x[mask,2]
instead of x[mask][:,2]
:
for u in np.unique(x[:,0]):
mask= x[:, 0] == u
x[mask,2]=x[mask,2]/np.sum((x[mask,2]))
Upvotes: 1