Reputation: 445
Based on the following mock DF:
df = pd.DataFrame({'State': {0: "AZ", 1: "AZ", 2:"AZ", 3: "AZ", 4: "AK", 5: "AK", 6 : "AK", 7: "AK"},
'# of Boxes': {0: 1, 1: 2, 2:2, 3: 1, 4: 2, 5: 2, 6 : 1, 7: 2},
'Price': {0: 2, 1: 4, 2:15, 3: 25, 4: 17, 5: 13, 6 : 3, 7: 3}},
columns=['State', '# of Boxes', 'Price'])
print(df)
State # of Boxes Price
0 AZ 1 2
1 AZ 2 4
2 AZ 2 15
3 AZ 1 25
4 AK 2 17
5 AK 2 13
6 AK 1 3
7 AK 2 3
I want to bin the Prices as (0, 15], (15, 30], then get the % of the total by box, by state.
State Box Price (0,15] Price (15,30]
AZ 1 .5 .5
AZ 2 1 0
AK 1 1 0
AK 2 .66 .33
I've tried pivoting using an agg function but I can't seem to figure it out.
Thank you!
Upvotes: 3
Views: 2174
Reputation: 863741
I think you can use groupby
by columns with binned Series
created by cut
, aggregated by size
and reshape by unstack
:
print (pd.cut(df['Price'], bins=[0,15,30]))
0 (0, 15]
1 (0, 15]
2 (0, 15]
3 (15, 30]
4 (15, 30]
5 (0, 15]
6 (0, 15]
7 (0, 15]
Name: Price, dtype: category
Categories (2, object): [(0, 15] < (15, 30]
df1 = df.Price.groupby([df['State'],df['# of Boxes'],pd.cut(df['Price'], bins=[0,15,30])])
.size()
.unstack(fill_value=0)
print (df1)
Price (0, 15] (15, 30]
State # of Boxes
AK 1 1 0
2 2 1
AZ 1 1 1
2 2 0
Then divide all values by sum
with div
df1 = df1.div(df1.sum(axis=1), axis=0)
print (df1)
Price (0, 15] (15, 30]
State # of Boxes
AK 1 1.000000 0.000000
2 0.666667 0.333333
AZ 1 0.500000 0.500000
2 1.000000 0.000000
Timings:
In [135]: %timeit (jez(df))
100 loops, best of 3: 3.51 ms per loop
In [136]: %timeit (maxu(df))
100 loops, best of 3: 6.21 ms per loop
def jez(df):
df1 = df.Price.groupby([df['State'],df['# of Boxes'],pd.cut(df['Price'], bins=[0,15,30])]).size().unstack(fill_value=0)
return df1.div(df1.sum(1), axis=0)
def maxu(df):
pvt = df.assign(bins=pd.cut(df.Price, [0,15,30])).pivot_table(index=['State','# of Boxes'], columns='bins', aggfunc='size', fill_value=0)
return pvt.apply(lambda x: x/pvt.sum(1))
Upvotes: 4
Reputation: 210982
Here is a solution using pivot_table()
method:
In [57]: pvt = (df.assign(bins=pd.cut(df.Price, [0,15,30]))
....: .pivot_table(index=['State','# of Boxes'],
....: columns='bins', aggfunc='size', fill_value=0)
....: )
In [58]: pvt
Out[58]:
bins (0, 15] (15, 30]
State # of Boxes
AK 1 1 0
2 2 1
AZ 1 1 1
2 2 0
In [59]: pvt.apply(lambda x: x/pvt.sum(1))
Out[59]:
bins (0, 15] (15, 30]
State # of Boxes
AK 1 1.000000 0.000000
2 0.666667 0.333333
AZ 1 0.500000 0.500000
2 1.000000 0.000000
Upvotes: 2