va404
va404

Reputation: 21

Apply condition on DataFrame nested array

From a Pandas DataFrame like the following:

I want to apply a filter so it shows only the rows containing arrays with all elements within the range 10 > Muon_pt > 20 or some elements within the range 50 > Electron_pt > 100.

I do so by defining two functions:

def anyCut(x, minn , maxx):
    for i in x:
        if i > minn and i < maxx:
            return True
    return False

def allCut(x, minn, maxx):

    for i in x:
        if i < minn or i > maxx:
            return False    
    return True

And then, applying it:

minElectronPt = 50.0
maxElectronPt = 100.0

minMuonPt = 10
maxMuonPt = 20

df[
    (
        (df["nElectron"]>1)
        &
        (df["nMuon"]>1)
    )
    &
    (
        (df["Electron_charge"].apply(lambda x: all(x == -1)))
        &
        (
            (
                df["Electron_pt"].apply(lambda x: anyCut(x, minElectronPt, maxElectronPt))
            )

            |

            (
                df["Muon_pt"].apply(lambda x: allCut(x, minMuonPt, maxMuonPt))
            )
        )
    )
].head()

Getting:

Is there any way to apply this filter without looping through the nested arrays (i.e to replace anyCut and allCut functions)?

Upvotes: 2

Views: 250

Answers (1)

Nathan
Nathan

Reputation: 471

Here you can use Numpy arrays and avoid for loops, like:

import numpy as np

def anyCut(x, minn , maxx):
    x_np=np.array(x)
    if (x_np > minn).all() and (x_np < maxx).all()
        return True
    return False

def allCut(x, minn, maxx):
    x_np=np.array(x)
    if (x_np > minn).all() or (x_np < maxx).all()
        return False  
    return True

Upvotes: 1

Related Questions