Reputation: 59
df = pd.DataFrame({'user': ['Bob', 'Jane', 'Alice','Jane', 'Alice','Bob', 'Alice'],
'income': [40000, np.nan, 42000, 50000, np.nan, np.nan, 30000]})
user income
0 Bob 40000.0
1 Jane NaN
2 Alice 42000.0
3 Jane 50000.0
4 Alice NaN
5 Bob NaN
6 Alice 30000.0
I want to find the count of all the Null Values in 'income' column based on 'user' column in my df ?
I'm trying something like this: len(df[df.income.isnull().sum()])
but it is incomplete.
Upvotes: 2
Views: 1062
Reputation: 17911
You can use the method value_counts()
:
df.loc[df['income'].isna(), 'user'].value_counts()
Output:
Jane 1
Bob 1
Alice 1
Name: user, dtype: int64
Upvotes: 3
Reputation: 30930
I would use pdcrosstab
pd.crosstab(df['user'], df['income'].isnull())[True].sort_values(ascending = False)
#user
#Alice 1
#Bob 1
#Jane 1
#Name: True, dtype: int64
or GroupBy.sum
df['income'].isnull().groupby(df['user']).sum().astype(int).sort_values(ascending=False)
#user
#Alice 1
#Bob 1
#Jane 1
#Name: income, dtype: int64
Note
I have used astype here because if the count is 1 it will return True
Upvotes: 2