user3222101
user3222101

Reputation: 1330

how to filter pandas dataframe on the basis of condition?

I have a pandas dataframe which is built from a table with unique t_id and s_id and I want to drop the records from this dataframe for the t_id's that has country_date null for all of the s_id's.

data like:

t_id s_id country_date
T1   S1   jan
T1   S2   mar
T2   S1   
T2   S2
T3   S2   jan
T3   S3   

result :

t_id s_id country_date
T1   S1   jan
T1   S2   mar
T3   S2   jan
T3   S3   

i have written the below line but that's wrong:

raw_data.groupby("t_id").country_date.max().notnull()

Please could you provide the way to filter dataframe records on the above criteria.Also, print the t_ids that are filtered out.

Upvotes: 2

Views: 65

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

Use isnull and all:

df.groupby('t_id').filter(lambda x: ~x.country_date.isnull().all())

If those blanks are '' and not NaN you might need to:

df.replace('',pd.np.nan).groupby('t_id').filter(lambda x: ~x.country_date.isnull().all())

Output:

  t_id s_id country_date
0   T1   S1          jan
1   T1   S2          mar
4   T3   S2          jan
5   T3   S3          NaN

And, to see those ids that were dropped:

df.groupby('t_id').filter(lambda x: x.country_date.isnull().all())['t_id'].unique()

Output:

array(['T2'], dtype=object)

Upvotes: 3

Related Questions