Reputation: 336
Having a dataset, I have to group it in several ways (with MultiIndex), perform some aggregations and export results. One of these operations is binning (bucketing) on column with prices to obtain amount of elements in each bucket. I need 3 buckets where:
Example:
Product | Country | Sell | col | price | colb |
---|---|---|---|---|---|
First | DE | A | b | 100 | x |
Second | DE | A | g | 105 | z |
First | FR | A | b | 111 | x |
Second | FR | A | g | 100 | z |
First | DE | B | b | 109 | x |
Second | DE | B | g | 120 | z |
First | FR | B | b | 100 | x |
Second | FR | B | g | 200 | z |
What I expect:
Product | Country | Sell | 1x | 1.1x | >1.5x |
---|---|---|---|---|---|
First | DE | A | 1 | 0 | 0 |
B | 0 | 1 | 0 | ||
FR | A | 0 | 1 | 0 | |
B | 1 | 0 | 0 | ||
Second | DE | A | 1 | 0 | 0 |
B | 0 | 1 | 0 | ||
FR | A | 1 | 0 | 0 | |
B | 0 | 0 | 1 |
Now, what I'm trying to do is following:
import numpy as np
import pandas as pd
# some code
df_low_price = df.groupby(["Product","Country","Sell"])["price"].sort_values(by="price").nth(0)
df_low_price_1_1x = df_low_price.map(lambda n: n * 1.1)
df_low_price_1_5x = df_low_price.map(lambda n: n * 1.5)
df_main = pd.concat([
df_low_price,
df_low_price_1_1x,
df_low_price_1_5x, axis=1
])
I'm getting the boundaries, but I don't get the size of the groups. I know that I should rely on pd.cut or cut, but I have no idea how to do it pythonic/pandas way. Thank you in advance for any suggestions.
Upvotes: 0
Views: 938
Reputation: 31156
unstack()
df = pd.DataFrame({'Product': ['First',
'Second',
'First',
'Second',
'First',
'Second',
'First',
'Second'],
'Country': ['DE', 'DE', 'FR', 'FR', 'DE', 'DE', 'FR', 'FR'],
'Sell': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'col': ['b', 'g', 'b', 'g', 'b', 'g', 'b', 'g'],
'price': [100, 105, 111, 100, 109, 120, 100, 200],
'colb': ['x', 'z', 'x', 'z', 'x', 'z', 'x', 'z'],
'bin': ['1x', '1x', '1.1x', '1x', '1x', '1.1x', '1x', '>1.5x']})
df["bin"] = pd.cut(
df["price"],
bins=[
df["price"].min() - 1,
df["price"].min() * 1.1,
df["price"].min() * 1.5,
df["price"].max(),
],
labels=["1x", "1.1x", ">1.5x"],
)
df.groupby(["Product","Country","Sell","bin"]).agg({"col":"count"}).unstack().droplevel(0,1)
bin 1x 1.1x >1.5x
Product Country Sell
First DE A 1 0 0
B 1 0 0
FR A 0 1 0
B 1 0 0
Second DE A 1 0 0
B 0 1 0
FR A 1 0 0
B 0 0 1
Upvotes: 1