Reputation: 5922
I have a csv file which has a lot of strings called "NULL"
in it, in several columns.
I would like to select (filter in) rows that have a "NULL"
value in any of several specific columns.
Example:
["Firstname"] ["Lastname"] ["Profession"] "Jeff" "Goldblum" "NULL" "NULL" "Coltrane" "Musician" "Richard" "NULL" "Physicist"
Here, I would like to filter in (select) rows in df
that have the value "NULL"
in the column "Firstname"
or "Lastname"
– but not if the value is "NULL"
in "Profession"
.
This manages to filter in strings (not None
) in one column:
df = df[df["Firstname"].str.contains("NULL", case=False)]
I have however attempted to convert the "NULL"
strings to None
via:
df = df.where((pd.notnull(df)), None)
df.columns = df.columns.str.lower()
Given the above str.contains
filtering, perhaps it's easier to filter in "NULL"
strings before converting to None
?
Upvotes: 3
Views: 7743
Reputation: 862841
I think you need first replace
NULL
string to NaN
. Then check all NaN
values in selected columns by isnull
and select all rows where is any
True
by boolean indexing
:
df = df.replace("NULL", np.nan)
print (df[['Firstname','Lastname']].isnull())
Firstname Lastname
0 False False
1 True False
2 False True
print (df[df[['Firstname','Lastname']].isnull().any(1)])
Firstname Lastname Profession
1 NaN Coltrane Musician
2 Richard NaN Physicist
Upvotes: 7
Reputation: 737
you can try:
df.replace(to_replace="NULL", value = None)
to replace all the occurence of "NULL"
to None
Upvotes: 1