Reputation: 4470
df:
A
0 219
1 590
2 272
3 945
4 175
5 930
6 662
7 472
8 251
9 130
I am trying to create a new column quantile based on which quantile the value falls in, for example:
if value > 1st quantile : value = 1
if value > 2nd quantile : value = 2
if value > 3rd quantile : value = 3
if value > 4th quantile : value = 4
Code:
f_q = df['A'] .quantile (0.25)
s_q = df['A'] .quantile (0.5)
t_q = df['A'] .quantile (0.75)
fo_q = df['A'] .quantile (1)
index = 0
for i in range(len(test_df)):
value = df.at[index,"A"]
if value > 0 and value <= f_q:
df.at[index,"A"] = 1
elif value > f_q and value <= s_q:
df.at[index,"A"] = 2
elif value > s_q and value <= t_q:
df.at[index,"A"] = 3
elif value > t_q and value <= fo_q:
df.at[index,"A"] = 4
index += 1
The code works fine. But I would like to know if there is a more efficient pandas way of doing this. Any suggestions are helpful.
Upvotes: 1
Views: 97
Reputation: 402263
Yes, using pd.qcut
:
>>> pd.qcut(df.A, 4).cat.codes + 1
0 1
1 3
2 2
3 4
4 1
5 4
6 4
7 3
8 2
9 1
dtype: int8
(Gives me exactly the same result your code does.)
You could also call np.unique
on the qcut
result:
>>> np.unique(pd.qcut(df.A, 4), return_inverse=True)[1] + 1
array([1, 3, 2, 4, 1, 4, 4, 3, 2, 1])
Or, using pd.factorize
(note the slight difference in the output):
>>> pd.factorize(pd.qcut(df.A, 4))[0] + 1
array([1, 2, 3, 4, 1, 4, 4, 2, 3, 1])
Upvotes: 2