bdubs88
bdubs88

Reputation: 19

Getting size() or groupby & count to work across all columns

Sorry if this seems repetive, i've found a lot of close answers using groupby and size but none that return the column header as the index.

I have the following df (which actually has 340 columns and many rows):

import pandas as pd
data = {'Name_Clean_40_40_Correct':['0','1','0','0'], 'Name_Clean_40_80_Correct':['0','1','1','N/A'],'Name_Clean_40_60_Correct':['N/A','N/A','0','1']}
df_third = pd.DataFrame(data)

I am trying to count the instances of '0','1', and 'N/A' for each column. So i'd like to have the index be the column names and the columns be '0','1', and 'N/A'.

I was trying this, but i'm afraid it is very inefficient or incorrect, since it won't complete.

def countx(x, colname):
    df_thresholds=df_third.groupby(colname).count()

for col in df_thresholds.columns:    
    df_thresholds[col + '_Count'] = df_third.apply(countx, axis=1, args=(col,))

I can do it for one column but that would be a pain:

df_thresholds=df_third.groupby('Name_Clean_100_100_Correct').count()
df_thresholds=df_thresholds[['Name_Raw']]
df_thresholds=df_thresholds.T

Upvotes: 0

Views: 29

Answers (1)

ipj
ipj

Reputation: 3598

If I understand correctly this should work:

df_third.apply(pd.Series.value_counts)

result:

     Name_Clean_40_40_Correct  ...  Name_Clean_40_60_Correct
0                         3.0  ...                         1
1                         1.0  ...                         1
N/A                       NaN  ...                         2

BTW: to select only columns containing 'Correct':

df_third.filter(like='Correct')

Transposed form df_third.T:

                            0    1  N/A
Name_Clean_40_40_Correct  3.0  1.0  NaN
Name_Clean_40_80_Correct  1.0  2.0  1.0
Name_Clean_40_60_Correct  1.0  1.0  2.0

Upvotes: 1

Related Questions