Reputation: 1998

Filter df based on multiple column values of other columns in the same df

I have df below:

id | status | id_reference
1    True        NaN
4    False       1
2    False       Nan
7    False       3
6    True        2
10   True        4

How can I only filter or want the rows in the df, where:

the status column is True and the id_reference column value exists in another row in the df as a value in the id column as a value.

So based on above df,

I would want rows below:

id | status | id_reference 
6    True        2
10   True        4

actual df is much larger as this is a sample

Upvotes: 2

Answers (4)

Henry Yik

Reputation: 22503

Cast id_reference to float if you haven't already and use isin:

print (df.loc[df["status"]&df["id_reference"].astype(float).isin(df["id"])])

   id  status id_reference
4   6    True            2
5  10    True            4

Upvotes: 3

Saurabh Porwal

Reputation: 46

def filterDataSet(self,dataSetValues):
    try :
        print(dataSetValues)  dataSetValues= dataSetValues.fillna(0)
        dataSetValues = dataSetValues[dataSetValues.id_reference>0]
        dataSetValues = dataSetValues[dataSetValues.status == True]
        return dataSetValues
    except ValueError as ve:
        print("Exception while filter data ", ve)

Upvotes: 0

Soumendra Mishra

Reputation: 3663

You can try this:

df = df.query('status == True & id_reference.notnull()')
print(df)

Upvotes: 0

Sadiq Raza

Reputation: 354

Here is what you can try:

df.loc[df["status"] & df["id_ref"].isin(df["id"])]

Upvotes: 2

Filter df based on multiple column values of other columns in the same df

Answers (4)

Related Questions