How I can dynamically define bins for grouped data using pandas?

Question

Having a dataset, I have to group it in several ways (with MultiIndex), perform some aggregations and export results. One of these operations is binning (bucketing) on column with prices to obtain amount of elements in each bucket. I need 3 buckets where:

in first bucket I obtain all elements for which the price is equal to the lowest price for that grouping or equal or not higher than 110% of the lowest price
in second bucket I obtain all elements for which the price is higher than 110% of the lowest price, but or equal or not higher than 150% of the lowest price
in third bucket - the rest.

Example:

Product	Country	Sell	col	price	colb
First	DE	A	b	100	x
Second	DE	A	g	105	z
First	FR	A	b	111	x
Second	FR	A	g	100	z
First	DE	B	b	109	x
Second	DE	B	g	120	z
First	FR	B	b	100	x
Second	FR	B	g	200	z

What I expect:

Product	Country	Sell	1x	1.1x	>1.5x
First	DE	A	1	0	0
		B	0	1	0
	FR	A	0	1	0
		B	1	0	0
Second	DE	A	1	0	0
		B	0	1	0
	FR	A	1	0	0
		B	0	0	1

Now, what I'm trying to do is following:

import numpy as np
import pandas as pd

# some code

df_low_price = df.groupby(["Product","Country","Sell"])["price"].sort_values(by="price").nth(0)
df_low_price_1_1x = df_low_price.map(lambda n: n * 1.1)
df_low_price_1_5x = df_low_price.map(lambda n: n * 1.5)

df_main = pd.concat([
df_low_price,
df_low_price_1_1x,
df_low_price_1_5x, axis=1
])

I'm getting the boundaries, but I don't get the size of the groups. I know that I should rely on pd.cut or cut, but I have no idea how to do it pythonic/pandas way. Thank you in advance for any suggestions.

Rob Raymond · Accepted Answer

just followed what you describe
to bin cut
multi-index and columns groupby/agg and unstack()

df = pd.DataFrame({'Product': ['First',
  'Second',
  'First',
  'Second',
  'First',
  'Second',
  'First',
  'Second'],
 'Country': ['DE', 'DE', 'FR', 'FR', 'DE', 'DE', 'FR', 'FR'],
 'Sell': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
 'col': ['b', 'g', 'b', 'g', 'b', 'g', 'b', 'g'],
 'price': [100, 105, 111, 100, 109, 120, 100, 200],
 'colb': ['x', 'z', 'x', 'z', 'x', 'z', 'x', 'z'],
 'bin': ['1x', '1x', '1.1x', '1x', '1x', '1.1x', '1x', '>1.5x']})

df["bin"] = pd.cut(
    df["price"],
    bins=[
        df["price"].min() - 1,
        df["price"].min() * 1.1,
        df["price"].min() * 1.5,
        df["price"].max(),
    ],
    labels=["1x", "1.1x", ">1.5x"],
)

df.groupby(["Product","Country","Sell","bin"]).agg({"col":"count"}).unstack().droplevel(0,1)

output

bin                   1x  1.1x  >1.5x
Product Country Sell                 
First   DE      A      1     0      0
                B      1     0      0
        FR      A      0     1      0
                B      1     0      0
Second  DE      A      1     0      0
                B      0     1      0
        FR      A      1     0      0
                B      0     0      1

How I can dynamically define bins for grouped data using pandas?

Answers (1)

output

Related Questions