New column in Pandas dataframe based on value of variable in existing column

Question

I am having difficulties creating a new column with a value that's based on the value of an existing column in that same dataframe. The existing column is numeric and I'm trying give the new column a categorical value of high, medium, low based on something like:

low: < (max-min)/3

med: (max-min)/3 - (max-min)/3 *2

high: > (max-min)/3 *2

Still learning Pandas, so any help is appreciated. Thanks!

EDIT:

This is what I have attempted:

df_unit_day_hour['Level_Score'] = pd.cut(df_unit_day_hour['Level_Score'], q=3, labels=['low', 'medium', 'high'])

I think it's almost what I need, but I'm getting an error (KeyError). Would it be because df_unit_day_hour['Level_Score'] is a float?

firelynx · Accepted Answer

Sounds like you want to recreate the Series.cut function

Consider this example below:

import numpy as np
import pandas as pd

df = pd.DataFrame({'val':np.random.choice(10, 10)})
df['cat'] = pd.cut(df['val'], [-1,2,5,10], labels=['low', 'medium', 'high'])
    df

   val   cat
0    6  high
1    2   low
2    7  high
3    7  high
4    8  high
5    8  high
6    9  high
7    6  high
8    2   low
9    0   low

New column in Pandas dataframe based on value of variable in existing column

Answers (1)

Related Questions