Sonny Rogers
Sonny Rogers

Reputation: 31

Filtering function for pandas - VIewing NaN values within a column

Function I have created:

#Create a function that identifies blank values
def GPID_blank(df, variable):
    df = df.loc[df['GPID'] == variable]
    return df

Test:

variable = ''
test = GPID_blank(df, variable)
test

Goal: Create a function that can filter any dataframe column 'GPID' to see all of the rows where GPID has missing data.

I have tried running variable = 'NaN' and still no luck. However, I know the function works, as if I use a real-life variable "OH82CD85" the function filters my dataset accordingly.

Therefore, why doesn't it filter out the blank cells variable = 'NaN'? I know for my dataset, there are 5 rows with GPID missing data.

Example df:

df = pd.DataFrame({'Client': ['A','B','C'], 'GPID':['BRUNS2','OH82CD85','']})

    Client  GPID
0   A   BRUNS2
1   B   OH82CD85
2   C   

Sample of GPID column:

0     OH82CD85
1     BW07TI20
2     OW36HW81
3     PE56TA73
4     CT46SX81
5     OD79AU80
6     GF46DB60
7     OL07ST01
8     VP38SM57
9     AH90AE61
10    PG86KO78
11         NaN
12         NaN
13    SO21GR72
14    DY85IN90
15    KW80CV02
16    CM15QP83
17    VC38FP82
18    DA36RX05
19    DD74HD38

Upvotes: 3

Views: 47

Answers (3)

weasel
weasel

Reputation: 574

You can't really search for NaN values like an expression. Also, in your example dataframe, '' is not NaN, but is str, and can be searched like an expression.

Instead, you need to check when you want to filter for NaN, and filter differently:

def GPID_blank(df, variable):
    if pd.isna(variable):
        df = df.loc[df['GPID'].isna()]
    else:
        df = df.loc[df['GPID'] == variable]
    return df

Upvotes: 1

It's not working because with variable = 'NaN' you're looking for a string which content is 'NaN', not for missing values.

You can try:

import pandas as pd

def GPID_blank(df):
  # filtered dataframe with NaN values in GPID column
  blanks = df[df['GPID'].isnull()].copy()
  return blanks

filtered_df = GPID_blank(df)

Upvotes: 0

user17242583
user17242583

Reputation:

You can't use == with NaN. NaN != NaN.

Instead, you can modify your function a little to check if the parameter is NaN using pd.isna() (or np.isnan()):

def GPID_blank(df, variable):
    if pd.isna(variable):
        return df.loc[df['GPID'].isna()]
    else:
        return df.loc[df['GPID'] == variable]

Upvotes: 1

Related Questions