Reputation: 4584
I have data frame of many columns consisting float values. I want to delete a row if any of the columns have value below 20.
code:
xdf = pd.DataFrame({'A':np.random.uniform(low=-50, high=53.3, size=(5)),'B':np.random.uniform(low=10, high=130, size=(5)),'C':np.random.uniform(low=-50, high=130, size=(5)),'D':np.random.uniform(low=-100, high=200, size=(5))})
xdf =
A B C D
0 -9.270533 42.098425 91.125009 148.350655
1 17.771411 55.564825 106.396381 -89.082831
2 -22.602563 99.330643 17.590466 73.985202
3 15.890920 76.011631 52.366311 194.023063
4 35.202379 41.973846 32.576890 100.523902
# my code
xdf[xdf[cols].ge(20).all(axis=1)]
Out[17]:
A B C D
4 35.202379 41.973846 32.57689 100.523902
Expected output: drop a row if any column has below 20 value
xdf =
A B C D
4 35.202379 41.973846 32.576890 100.523902
Is this the best way of doing it?
Upvotes: 2
Views: 94
Reputation: 19610
You can use the numpy equivalent of .ge instead:
xdf.loc[np.greater(xdf,20).all(axis=1)]
Upvotes: 1
Reputation: 4318
To do it in numpy:
xdf = pd.DataFrame({'A':np.random.uniform(low=-50, high=53.3, size=(5)),'B':np.random.uniform(low=10, high=130, size=(5)),'C':np.random.uniform(low=-50, high=130, size=(5)),'D':np.random.uniform(low=-100, high=200, size=(5))})
%timeit xdf[xdf[['A','B','C','D']].ge(20).all(axis=1)]
%timeit xdf[(xdf[['A','B','C','D']].values >= 20).all(axis=1)]
705 µs ± 277 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
460 µs ± 1.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
If you do not want to keep result in DataFrame this can even be faster:
xdf.values[(xdf[['A','B','C','D']].values >= 20).all(axis=1)]
Upvotes: 1
Reputation: 86
As numpy is lighter and therefore faster in terms of calculations with numbers, try this:
a = np.array([np.random.uniform(low=-50, high=53.3, size=(5)),
np.random.uniform(low=10, high=130, size=(5)),
np.random.uniform(low=-50, high=130, size=(5)),
np.random.uniform(low=-100, high=200, size=(5))])
print(a[np.all(a > 20, axis=1)])
If you want to stick with pandas, another idea would be:
xdfFiltered = xdf.loc[(xdf["A"] > 20) & (xdf["B"] > 20) & (xdf["C"] > 20) & (xdf["D"] > 20)]
Upvotes: 1