marillion
marillion

Reputation: 11180

Pandas Frequency of subcategories in a GroupBy

I have a DataFrame as shown below:

enter image description here

First I would like to get the overall frequencies of the CODE values, call it FREQ, then frequencies of the CODE values within each AXLE group and call it GROUP_FREQ.

I was able to calculate the FREQ column using the below code:

pivot = df[['AXLES','CODE']].pivot(['CODE']).agg(['count','mean','min','max'])
pivot['FREQ']=grouped_df.AXLES['count']/pivot.AXLES['count'].sum()*100`

this provides a nice grouped DataFrame as shown below:

enter image description here

However, I could not figure out how to calculate the frequencies within each AXLE group using this grouped_by DataFrame in the next step.

I tried:

pivot['GROUPFREQ']=pivot['AXLES','mean']['count']/pivot['AXLES','mean']['count'].sum()*100

However, this gives a KeyError: 'count'.

I may be on the wrong path, and what I am trying to achieve may not be done using groupby. I decided to check with the community after spending a couple of hours of trial and errors. I'd be glad if you could let me know what you think.

Thanks!

EDIT:

Reproducible input DataFrame:

,CODE,AXLES
0,0101,5
1,001,4
2,0110111,8
3,010111,7
4,0100,5
5,0101,5
6,0110111,8
7,00111,6
8,00111,6
9,0110111,8
10,0100,5
11,0110011,8
12,01011,6
13,0110111,8
14,0110111,8
15,011011,7
16,011011,7
17,011011,7
18,01011,6
19,01011,6

Desired Output for pivot DataFrame:

CODE,COUNT,AXLES,FREQ,GROUPFREQ
001,1,4,0.05,1.00
00111,2,6,0.1,0.40
0100,2,5,0.1,0.50
0101,2,5,0.1,0.50
01011,3,6,0.15,0.60
010111,1,7,0.05,0.25
0110011,1,8,0.05,0.17
011011,3,7,0.15,0.75
0110111,5,8,0.25,0.83

For the first line of the output: 001 is seen only once in the whole data set (20 records). Thus FREQ = 1/20 = 0.05 When the data is grouped by AXLES, for the AXLES=4 group, 001 is the only record, thus the GROUPFREQ = 1/1 = 1.00. (The same code cannot occur under various AXLE groups, so 001 only needs to be checked for AXLES=4.)

Upvotes: 0

Views: 1478

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150785

Do you mean:

pivot['FREQ'] = df.groupby('AXLES').CODE.value_counts(normalize=True).reset_index(level=0,drop=True)

Output:

       AXLES                   FREQ
       count mean min max          
CODE                               
1          1    4   4   4  1.000000
100        2    5   5   5  0.500000
101        2    5   5   5  0.500000
111        2    6   6   6  0.400000
1011       3    6   6   6  0.600000
10111      1    7   7   7  0.250000
11011      3    7   7   7  0.750000
110011     1    8   8   8  0.166667
110111     5    8   8   8  0.833333

Upvotes: 1

Related Questions