Reputation: 75
I have a dataframe and I need to create another dataframe that will include only records that are missing on a particular column.
'Missing' means it can be empty, 'N/A', 'NODATA' or in some cases other fixed values (I need a function so I can simply pass a list that fits the case).
I can filter it by the values, so the N/As and NODATA records are found, but it doesn't see the empty ones.
I tried multiple approaches, is it possible to do that in that way (so an empty value is a part of the list), or do I need to filter empty separately and merge it to the list-ones?
def GetEmpty(df, on, items_list=['']):
column = df[on]
boolean_series = column.isin(items_list)
empty_df = df[boolean_series]
return empty_df
empty_df = GetEmpty(df, 'Isin', ['','N/A', 'NODATA'])
Edit: the dataframe:
System Book Isin ID
0 ABC DEF XYZ A
1 ABC DEF NaN B
2 ABC DEF ABC C
3 ABC DEF NODATA D
Upvotes: 4
Views: 1855
Reputation: 434
If I understand correctly, you're looking to check if it's empty (ie: Python's None or Numpy.NaN) AND check if it's one of these special values - the empty string (""), "N/A", or "NODATA".
Pandas has a build-in function called isnull() which checks for None or np.NaN.
So, you could do something like this...
import pandas as pd
def GetEmpty(df, on, items_list=['']):
boolean_series = df.apply(lambda x: True if ((pd.isnull(x[on])) or (x[on] in items_list)) else False, axis=1)
empty_df = df[boolean_series]
return empty_df
Upvotes: 2