LeeHarveyOswald
LeeHarveyOswald

Reputation: 39

Assign Quantile Bins to Variable

I am using the pandas .qcut() function to divide a column 'AveragePrice' into 4 bins. I would like to assign each bin to a new variable. The reason for this is to do a separate analysis on each quartile. IE) I would like something like:

bin1 = quartile 1

bin2= quartile 2 

bin3 = quartile 3 

bin4= quantile 4 

Here is what I'm working with.

`pd.qcut(data['AveragePrice'], q=4)`

2        (0.439, 1.1]
3        (0.439, 1.1]
17596     (1.1, 1.38]
17600     (1.1, 1.38]

Name: AveragePrice, Length: 14127, dtype: category

Categories (4, interval[float64]): [(0.439, 1.1] < (1.1, 1.38] < (1.38, 1.69] < (1.69, 3.25]]

Upvotes: 2

Views: 902

Answers (1)

tdy
tdy

Reputation: 41407

If I understand correctly, you can "pivot" your quartile values into columns.

Toy example:

import pandas as pd
import numpy as np
df = pd.DataFrame({ 'AveragePrice': np.random.randint(0, 100, size=10) })
AveragePrice
0 20
1 29
2 53
3 30
4 3
5 4
6 78
7 62
8 75
9 1

Create the Quartile column, pivot Quartile into columns, and rename the columns to something more reader-friendly:

df['Quartile'] = pd.qcut(df.AveragePrice, q=4)
pivot = df.reset_index().pivot_table(
    index='index',
    columns='Quartile',
    values='AveragePrice')
pivot.columns = ['Q1', 'Q2', 'Q3', 'Q4']
Q1 Q2 Q3 Q4
0 NaN 20.0 NaN NaN
1 NaN 29.0 NaN NaN
2 NaN NaN 53.0 NaN
3 NaN NaN 30.0 NaN
4 3.0 NaN NaN NaN
5 4.0 NaN NaN NaN
6 NaN NaN NaN 78.0
7 NaN NaN NaN 62.0
8 NaN NaN NaN 75.0
9 1.0 NaN NaN NaN

Now you can analyze the bins separately, e.g., describe them:

pivot.describe()
Q1 Q2 Q3 Q4
count 3.000000 2.000000 2.000000 3.000000
mean 2.666667 24.500000 41.500000 71.666667
std 1.527525 6.363961 16.263456 8.504901
min 1.000000 20.000000 30.000000 62.000000
25% 2.000000 22.250000 35.750000 68.500000
50% 3.000000 24.500000 41.500000 75.000000
75% 3.500000 26.750000 47.250000 76.500000
max 4.000000 29.000000 53.000000 78.000000

Upvotes: 1

Related Questions