Pandas: Filter in rows that have a Null/None/NaN value in any of several specific columns

Question

I have a csv file which has a lot of strings called "NULL" in it, in several columns.

I would like to select (filter in) rows that have a "NULL" value in any of several specific columns.

Example:

["Firstname"]   ["Lastname"]    ["Profession"]
"Jeff"          "Goldblum"      "NULL"
"NULL"          "Coltrane"      "Musician"
"Richard"       "NULL"          "Physicist"

Here, I would like to filter in (select) rows in df that have the value "NULL" in the column "Firstname" or "Lastname" – but not if the value is "NULL" in "Profession".

This manages to filter in strings (not None) in one column:

df = df[df["Firstname"].str.contains("NULL", case=False)]

I have however attempted to convert the "NULL" strings to None via:

df = df.where((pd.notnull(df)), None)
df.columns = df.columns.str.lower()

Given the above str.contains filtering, perhaps it's easier to filter in "NULL" strings before converting to None?

jezrael · Accepted Answer

I think you need first replace NULL string to NaN. Then check all NaN values in selected columns by isnull and select all rows where is any True by boolean indexing:

df = df.replace("NULL", np.nan)

print (df[['Firstname','Lastname']].isnull())
  Firstname Lastname
0     False    False
1      True    False
2     False     True

print (df[df[['Firstname','Lastname']].isnull().any(1)])
  Firstname  Lastname Profession
1       NaN  Coltrane   Musician
2   Richard       NaN  Physicist

Pandas: Filter in rows that have a Null/None/NaN value in any of several specific columns

Answers (2)

Related Questions