Reputation: 5559
Hi I hope you could help me with this.
I have a dataframe df_test
import pandas as pd
import numpy as np
df_test = pd.DataFrame(data=[[np.nan,2,3,"male"],
[4,5,6,"female"],
[1,2,np.nan,"female"]],
columns=['a','b','c','sex'])
df_test
a b c sex
0 NaN 2 3.0 male
1 4.0 5 6.0 female
2 1.0 2 NaN female
and for each category in the sex column I would like to count the number of values greater than 1 in each column of the dataframe (a,b,c)
.
df_results
a_count b_count c_count sex
0 NaN 1 1 male
1 2 2 1 female
Upvotes: 1
Views: 1013
Reputation: 150735
Let ust try:
(df_test.drop('sex', axis=1).ge(1) # compare the data with `1`
.groupby(df_test['sex'],sort=False).sum() # count the number of `True` with sum
.add_suffix('_count') # add the suffix
.reset_index() # make `sex` a column
)
Output:
sex a_count b_count c_count
0 male 0 1 1
1 female 2 2 1
Upvotes: 0
Reputation: 7779
You can use groupby
and aggregate to count the values :
df_test = pd.DataFrame(data=[[np.nan,2,3,"male"],
[4,5,6,"female"],
[1,2,np.nan,"female"]],
columns=['a','b','c','sex'])
df_test.groupby('sex', sort=False).agg(lambda x : len(x.dropna()>1))
This gives us expected output :
a b c
sex
male 0 1 1
female 2 2 1
If at all you want those values the values to be Nan, then you can do
df_test.groupby('sex', sort=False).agg(lambda x : np.nan if len(x.dropna()) == 0 else len(x.dropna()))
a b c
sex
male NaN 1 1
female 2.0 2 1
Since the column contains NaN values, pandas does some internal optimization to convert int
to float
internally. So you may have to explicitly convert the column to int.
Upvotes: 2
Reputation: 2372
please correct
columns=[['a','b','c','sex']]
and replace with
columns=['a','b','c','sex']
then
pd.concat([df_test.sex, df_test.drop(columns=["sex"]) >= 1], axis=1).groupby("sex").sum().replace(0, np.nan).rename(columns=lambda x: x + "_count").reset_index()
Upvotes: 0
Reputation: 1625
Check Below code (I have assumed all values will be greater than or equal to 1 or nan based upon the example data)
pd.DataFrame(np.where(df_test.values == 1, np.nan, df_test.values), columns = df_test.columns).groupby(“sex”).count().reset_index()
Upvotes: 0