gabboshow
gabboshow

Reputation: 5559

Count number of elements greater than 1 in pandas dataframe

Hi I hope you could help me with this. I have a dataframe df_test

import pandas as pd
import numpy as np
df_test = pd.DataFrame(data=[[np.nan,2,3,"male"],
                             [4,5,6,"female"],
                             [1,2,np.nan,"female"]],
                       columns=['a','b','c','sex'])


df_test 
         a  b  c       sex
    0  NaN  2  3.0     male
    1  4.0  5  6.0     female
    2  1.0  2  NaN     female

and for each category in the sex column I would like to count the number of values greater than 1 in each column of the dataframe (a,b,c).

df_results 
         a_count  b_count  c_count       sex
    0    NaN      1        1             male
    1    2        2        1             female

Upvotes: 1

Views: 1013

Answers (4)

Quang Hoang
Quang Hoang

Reputation: 150735

Let ust try:

(df_test.drop('sex', axis=1).ge(1)           # compare the data with `1`
   .groupby(df_test['sex'],sort=False).sum() # count the number of `True` with sum
   .add_suffix('_count')                     # add the suffix
   .reset_index()                            # make `sex` a column
)

Output:

      sex  a_count  b_count  c_count
0    male        0        1        1
1  female        2        2        1

Upvotes: 0

Himanshu Poddar
Himanshu Poddar

Reputation: 7779

You can use groupby and aggregate to count the values :

df_test = pd.DataFrame(data=[[np.nan,2,3,"male"],
                             [4,5,6,"female"],
                             [1,2,np.nan,"female"]],
                       columns=['a','b','c','sex'])

df_test.groupby('sex', sort=False).agg(lambda x : len(x.dropna()>1))

This gives us expected output :

        a  b  c
sex            
male    0  1  1
female  2  2  1

If at all you want those values the values to be Nan, then you can do

df_test.groupby('sex', sort=False).agg(lambda x : np.nan if len(x.dropna()) == 0 else len(x.dropna()))

          a  b  c
sex              
male    NaN  1  1
female  2.0  2  1

Since the column contains NaN values, pandas does some internal optimization to convert int to float internally. So you may have to explicitly convert the column to int.

Upvotes: 2

MoRe
MoRe

Reputation: 2372

please correct

columns=[['a','b','c','sex']]

and replace with

columns=['a','b','c','sex']

then

pd.concat([df_test.sex, df_test.drop(columns=["sex"]) >= 1], axis=1).groupby("sex").sum().replace(0, np.nan).rename(columns=lambda x: x + "_count").reset_index()

Upvotes: 0

Abhishek
Abhishek

Reputation: 1625

Check Below code (I have assumed all values will be greater than or equal to 1 or nan based upon the example data)

pd.DataFrame(np.where(df_test.values == 1, np.nan, df_test.values), columns = df_test.columns).groupby(“sex”).count().reset_index()

Upvotes: 0

Related Questions