Reputation: 577
I am having difficulties creating a new column with a value that's based on the value of an existing column in that same dataframe. The existing column is numeric and I'm trying give the new column a categorical value of high, medium, low based on something like:
low: < (max-min)/3
med: (max-min)/3 - (max-min)/3 *2
high: > (max-min)/3 *2
Still learning Pandas, so any help is appreciated. Thanks!
EDIT:
This is what I have attempted:
df_unit_day_hour['Level_Score'] = pd.cut(df_unit_day_hour['Level_Score'], q=3, labels=['low', 'medium', 'high'])
I think it's almost what I need, but I'm getting an error (KeyError). Would it be because df_unit_day_hour['Level_Score'] is a float?
Upvotes: 2
Views: 2370
Reputation: 32204
Sounds like you want to recreate the Series.cut
function
Consider this example below:
import numpy as np
import pandas as pd
df = pd.DataFrame({'val':np.random.choice(10, 10)})
df['cat'] = pd.cut(df['val'], [-1,2,5,10], labels=['low', 'medium', 'high'])
df
val cat
0 6 high
1 2 low
2 7 high
3 7 high
4 8 high
5 8 high
6 9 high
7 6 high
8 2 low
9 0 low
Upvotes: 6