cyotani
cyotani

Reputation: 61

pandas aggregate value counts across multiple columns into summary dataframe

I am looking for a way to tabulate the pandas value counts per column into a summary table. I've found a way to acheive what I want but Pandas must have a better way to do this.

the data frame has mutliple test steps with 'P' 'F' or ' ' data for each test run.

step1 = list('PPFP PFP ')
step2 = list('PFFP  FPF')
step3 = list(' PPPFFPFP')
step4 = list(' PPFPF PP')

df = pd.DataFrame({'step1': step1,'step2':step2, 'step3':step3,'step4':step4})

  step1 step2 step3 step4
0     P     P            
1     P     F     P     P
2     F     F     P     P
3     P     P     P     F
4                 F     P
5     P           F     F
6     F     F     P      
7     P     P     F     P
8           F     P     P

the output I am looking for is:

   step1  step2  step3  step4
P      5      3      5      5
F      2      4      3      2
       2      2      1      2

I've been able to solve this by looping through each column, doing value_counts then appending it to an output array but this seems clunky.

df2 = pd.DataFrame(index=['P', 'F', ' '])

for i in range(len(df.columns)):
    df2[df.columns.tolist()[i]] = df.iloc[:, i].value_counts(dropna=False)

Is there a more elegant way to accomplish this?

Upvotes: 1

Views: 1953

Answers (1)

jezrael
jezrael

Reputation: 863611

Use DataFrame.apply with value_counts:

df2 = df.apply(pd.value_counts)
print (df2)
   step1  step2  step3  step4
       2      2      1      2
F      2      4      3      2
P      5      3      5      5

For change order of rows add DataFrame.reindex by list of all values in index in list in expected order:

df2 = df.apply(pd.value_counts).reindex([' ','P','F'])
print (df2)
   step1  step2  step3  step4
       2      2      1      2
P      5      3      5      5
F      2      4      3      2

Upvotes: 4

Related Questions