Reputation: 21
From a Pandas DataFrame like the following:
I want to apply a filter so it shows only the rows containing arrays with all elements within the range 10 > Muon_pt > 20
or some elements within the range 50 > Electron_pt > 100
.
I do so by defining two functions:
def anyCut(x, minn , maxx):
for i in x:
if i > minn and i < maxx:
return True
return False
def allCut(x, minn, maxx):
for i in x:
if i < minn or i > maxx:
return False
return True
And then, applying it:
minElectronPt = 50.0
maxElectronPt = 100.0
minMuonPt = 10
maxMuonPt = 20
df[
(
(df["nElectron"]>1)
&
(df["nMuon"]>1)
)
&
(
(df["Electron_charge"].apply(lambda x: all(x == -1)))
&
(
(
df["Electron_pt"].apply(lambda x: anyCut(x, minElectronPt, maxElectronPt))
)
|
(
df["Muon_pt"].apply(lambda x: allCut(x, minMuonPt, maxMuonPt))
)
)
)
].head()
Getting:
Is there any way to apply this filter without looping through the nested arrays (i.e to replace anyCut
and allCut
functions)?
Upvotes: 2
Views: 250
Reputation: 471
Here you can use Numpy arrays and avoid for loops, like:
import numpy as np
def anyCut(x, minn , maxx):
x_np=np.array(x)
if (x_np > minn).all() and (x_np < maxx).all()
return True
return False
def allCut(x, minn, maxx):
x_np=np.array(x)
if (x_np > minn).all() or (x_np < maxx).all()
return False
return True
Upvotes: 1