Getting percentages after binning pandas dataframe

Question

Based on the following mock DF:

df = pd.DataFrame({'State': {0: "AZ", 1: "AZ", 2:"AZ", 3: "AZ", 4: "AK", 5: "AK", 6 : "AK", 7: "AK"},
                 '# of Boxes': {0: 1, 1: 2, 2:2, 3: 1, 4: 2, 5: 2, 6 : 1, 7: 2},
                 'Price': {0: 2, 1: 4, 2:15, 3: 25, 4: 17, 5: 13, 6 : 3, 7: 3}},
                 columns=['State', '# of Boxes', 'Price'])

print(df)
  State  # of Boxes  Price
0    AZ           1      2
1    AZ           2      4
2    AZ           2     15
3    AZ           1     25
4    AK           2     17
5    AK           2     13
6    AK           1      3
7    AK           2      3

I want to bin the Prices as (0, 15], (15, 30], then get the % of the total by box, by state.

State    Box    Price (0,15]    Price (15,30]
 AZ      1        .5             .5
 AZ      2        1              0
 AK      1        1              0
 AK      2        .66            .33

I've tried pivoting using an agg function but I can't seem to figure it out.

Thank you!

jezrael · Accepted Answer

I think you can use groupby by columns with binned Series created by cut, aggregated by size and reshape by unstack:

print (pd.cut(df['Price'], bins=[0,15,30]))
0     (0, 15]
1     (0, 15]
2     (0, 15]
3    (15, 30]
4    (15, 30]
5     (0, 15]
6     (0, 15]
7     (0, 15]
Name: Price, dtype: category
Categories (2, object): [(0, 15] < (15, 30]

df1 = df.Price.groupby([df['State'],df['# of Boxes'],pd.cut(df['Price'], bins=[0,15,30])])
              .size()
              .unstack(fill_value=0)

print (df1)
Price             (0, 15]  (15, 30]
State # of Boxes                   
AK    1                 1         0
      2                 2         1
AZ    1                 1         1
      2                 2         0

Then divide all values by sum with div

df1 = df1.div(df1.sum(axis=1), axis=0)
print (df1)
Price              (0, 15]  (15, 30]
State # of Boxes                    
AK    1           1.000000  0.000000
      2           0.666667  0.333333
AZ    1           0.500000  0.500000
      2           1.000000  0.000000

Timings:

In [135]: %timeit (jez(df))
100 loops, best of 3: 3.51 ms per loop

In [136]: %timeit (maxu(df))
100 loops, best of 3: 6.21 ms per loop

def jez(df):
    df1 = df.Price.groupby([df['State'],df['# of Boxes'],pd.cut(df['Price'], bins=[0,15,30])]).size().unstack(fill_value=0)
    return  df1.div(df1.sum(1), axis=0)

def maxu(df):    
    pvt = df.assign(bins=pd.cut(df.Price, [0,15,30])).pivot_table(index=['State','# of Boxes'], columns='bins', aggfunc='size', fill_value=0)
    return pvt.apply(lambda x: x/pvt.sum(1))

Getting percentages after binning pandas dataframe

Answers (2)

Related Questions