Zdunkov
Zdunkov

Reputation: 75

filter pandas dataframe by a list that includes an empty values

I have a dataframe and I need to create another dataframe that will include only records that are missing on a particular column.

'Missing' means it can be empty, 'N/A', 'NODATA' or in some cases other fixed values (I need a function so I can simply pass a list that fits the case).

I can filter it by the values, so the N/As and NODATA records are found, but it doesn't see the empty ones.

I tried multiple approaches, is it possible to do that in that way (so an empty value is a part of the list), or do I need to filter empty separately and merge it to the list-ones?

def GetEmpty(df, on, items_list=['']):
    column = df[on]
    boolean_series = column.isin(items_list)
    empty_df = df[boolean_series]
    return empty_df

empty_df = GetEmpty(df, 'Isin', ['','N/A', 'NODATA'])

Edit: the dataframe:

 System Book    Isin ID
0    ABC  DEF     XYZ  A
1    ABC  DEF     NaN  B
2    ABC  DEF     ABC  C
3    ABC  DEF  NODATA  D

Upvotes: 4

Views: 1855

Answers (1)

jeffsdata
jeffsdata

Reputation: 434

If I understand correctly, you're looking to check if it's empty (ie: Python's None or Numpy.NaN) AND check if it's one of these special values - the empty string (""), "N/A", or "NODATA".

Pandas has a build-in function called isnull() which checks for None or np.NaN.

So, you could do something like this...

import pandas as pd

def GetEmpty(df, on, items_list=['']):
        boolean_series = df.apply(lambda x: True if ((pd.isnull(x[on])) or (x[on] in items_list)) else False, axis=1)
        empty_df = df[boolean_series]
        return empty_df

Upvotes: 2

Related Questions