Reputation: 1330
I have a pandas dataframe which is built from a table with unique t_id and s_id and I want to drop the records from this dataframe for the t_id's that has country_date null for all of the s_id's.
data like:
t_id s_id country_date
T1 S1 jan
T1 S2 mar
T2 S1
T2 S2
T3 S2 jan
T3 S3
result :
t_id s_id country_date
T1 S1 jan
T1 S2 mar
T3 S2 jan
T3 S3
i have written the below line but that's wrong:
raw_data.groupby("t_id").country_date.max().notnull()
Please could you provide the way to filter dataframe records on the above criteria.Also, print the t_ids that are filtered out.
Upvotes: 2
Views: 65
Reputation: 153460
Use isnull
and all
:
df.groupby('t_id').filter(lambda x: ~x.country_date.isnull().all())
If those blanks are '' and not NaN you might need to:
df.replace('',pd.np.nan).groupby('t_id').filter(lambda x: ~x.country_date.isnull().all())
Output:
t_id s_id country_date
0 T1 S1 jan
1 T1 S2 mar
4 T3 S2 jan
5 T3 S3 NaN
And, to see those ids that were dropped:
df.groupby('t_id').filter(lambda x: x.country_date.isnull().all())['t_id'].unique()
Output:
array(['T2'], dtype=object)
Upvotes: 3