Reputation: 39
I am using the pandas .qcut() function to divide a column 'AveragePrice' into 4 bins. I would like to assign each bin to a new variable. The reason for this is to do a separate analysis on each quartile. IE) I would like something like:
bin1 = quartile 1
bin2= quartile 2
bin3 = quartile 3
bin4= quantile 4
Here is what I'm working with.
`pd.qcut(data['AveragePrice'], q=4)`
2 (0.439, 1.1]
3 (0.439, 1.1]
17596 (1.1, 1.38]
17600 (1.1, 1.38]
Name: AveragePrice, Length: 14127, dtype: category
Categories (4, interval[float64]): [(0.439, 1.1] < (1.1, 1.38] < (1.38, 1.69] < (1.69, 3.25]]
Upvotes: 2
Views: 902
Reputation: 41407
If I understand correctly, you can "pivot" your quartile values into columns.
Toy example:
import pandas as pd
import numpy as np
df = pd.DataFrame({ 'AveragePrice': np.random.randint(0, 100, size=10) })
AveragePrice | |
---|---|
0 | 20 |
1 | 29 |
2 | 53 |
3 | 30 |
4 | 3 |
5 | 4 |
6 | 78 |
7 | 62 |
8 | 75 |
9 | 1 |
Create the Quartile
column, pivot Quartile
into columns, and rename the columns to something more reader-friendly:
df['Quartile'] = pd.qcut(df.AveragePrice, q=4)
pivot = df.reset_index().pivot_table(
index='index',
columns='Quartile',
values='AveragePrice')
pivot.columns = ['Q1', 'Q2', 'Q3', 'Q4']
Q1 | Q2 | Q3 | Q4 | |
---|---|---|---|---|
0 | NaN | 20.0 | NaN | NaN |
1 | NaN | 29.0 | NaN | NaN |
2 | NaN | NaN | 53.0 | NaN |
3 | NaN | NaN | 30.0 | NaN |
4 | 3.0 | NaN | NaN | NaN |
5 | 4.0 | NaN | NaN | NaN |
6 | NaN | NaN | NaN | 78.0 |
7 | NaN | NaN | NaN | 62.0 |
8 | NaN | NaN | NaN | 75.0 |
9 | 1.0 | NaN | NaN | NaN |
Now you can analyze the bins separately, e.g., describe
them:
pivot.describe()
Q1 | Q2 | Q3 | Q4 | |
---|---|---|---|---|
count | 3.000000 | 2.000000 | 2.000000 | 3.000000 |
mean | 2.666667 | 24.500000 | 41.500000 | 71.666667 |
std | 1.527525 | 6.363961 | 16.263456 | 8.504901 |
min | 1.000000 | 20.000000 | 30.000000 | 62.000000 |
25% | 2.000000 | 22.250000 | 35.750000 | 68.500000 |
50% | 3.000000 | 24.500000 | 41.500000 | 75.000000 |
75% | 3.500000 | 26.750000 | 47.250000 | 76.500000 |
max | 4.000000 | 29.000000 | 53.000000 | 78.000000 |
Upvotes: 1