Reputation: 1946
I have the following pandas dataframe;
a = [['01', '12345', 'null'], ['02', '78910', '9870'], ['01', '23456', 'null'],['01', '98765', '8760']]
df_a = pd.DataFrame(a, columns=['id', 'order', 'location'])
I need to get a count of how many NULL values (NULL is a string) that occur for each ID. So the result would look like;
id null_count
01 02
I can get basic counts using a groupby:
new_df = df_a.groupby(['id', 'location'])['id'].count()
But the results return more than just the NULL values;
id location
01 8760 1
null 2
02 9870 1
Upvotes: 4
Views: 114
Reputation: 323226
Base on your own code , adding .loc
notice this is multi index slice ..
df_a.groupby(['id', 'location'])['id'].count().loc[:,'null']
Out[932]:
id
01 2
Name: id, dtype: int64
Upvotes: 5
Reputation: 210832
In [16]: df_a.set_index('id')['location'].eq('null').sum(level=0)
Out[16]:
id
01 2.0
02 0.0
Name: location, dtype: float64
Upvotes: 4
Reputation: 153460
Because in your source dataframe your NULLs are strings 'null', use:
df_a.groupby('id')['location'].apply(lambda x: (x=='null').sum())\
.reset_index(name='null_count')
Output:
id null_count
0 01 2
1 02 0
OR
df_a.query('location == "null"').groupby('id')['location'].size()\
.reset_index(name='null_count')
Output:
id null_count
0 01 2
Upvotes: 6