Alan
Alan

Reputation: 559

How To Filter Pandas Dataframe Ignoring Null Columns

Suppose I have the following dataframe:

     col1    col2    col3    col4
0    8       2       nan     nan               
1    nan     2       15      nan            
2    nan     2       15      4           
3    3       2       15      nan            

How can I filter the dataframe to show all rows which contain matching non-nan values? For example if I applied the following filter...

     col1    col2    col3    col4             
0    nan     2       15      nan          

...the desired output should be:

     col1    col2    col3    col4             
0    nan     2       15      nan            
1    nan     2       15      4           
2    3       2       15      nan   

Upvotes: 1

Views: 1981

Answers (2)

Valentino
Valentino

Reputation: 7361

Assuming that filtcol is your filter (a dataframe with one row) and df your starting dataframe, do:

cols = filtcol.dropna(1).columns
ddf = df.loc[~df[cols].isna().any(axis=1)]

cols is an index holding the names of the columns in your filter whose values are not NaN.
ddf is obtained by selecting the rows in the original dataframe whose col column values are all not NaN.

ddf is:

   col1  col2  col3  col4
1   NaN     2  15.0   NaN
2   NaN     2  15.0   4.0
3   3.0     2  15.0   NaN

Note that this solution checks only if the value is NaN or not. That means that your filter can have any non NaN value, there is no need to match the exact values in the dataframe. You will get the same result even if your filter is, for example:

   col1  col2  col3  col4
0   NaN     0     0   NaN

Upvotes: 1

anky
anky

Reputation: 75080

How can I filter the dataframe to show all rows which contain matching non-nan values

You can first dropna() on axis=1 to get rid of columns which has NaN in the filter df. Then merge;

print(df)
print('\n')
print(f)

   col1  col2  col3  col4
0   8.0     2   NaN   NaN
1   NaN     2  15.0   NaN
2   NaN     2  15.0   4.0
3   3.0     2  15.0   NaN


   col1  col2  col3  col4
0   NaN     2    15   NaN

final=df.merge(f.dropna(1))

   col1  col2  col3  col4
0   NaN     2  15.0   NaN
1   NaN     2  15.0   4.0
2   3.0     2  15.0   NaN

Upvotes: 3

Related Questions