How to create a new column with status based on value

Question

I have the following pandas dataframe

Suburb       Percentile Rank
Hume          0.20464135
Clayton       0.409162146
Moorabin      0.654550934
St Kilda      0.80464135
Point Cook   1.505447257

I want to create a new column called Rank classifier based on the "Percentile Rank" column value.

Rules would look like this;

perc_rank <= 0.2 then 'Very Low', 
perc_rank > 0.2 and perc_rank <= 0.4 then 'Low', 
perc_rank > 0.4 and perc_rank <= 0.6 then 'Medium', 
perc_rank > 0.6 and perc_rank <= 0.8 then 'High', 
perc_rank > 0.8 and perc_rank <= 1.0 then 'Very High'

I was able to produce Classifier output in SQL. But unable to do the same using python with creating a new column.

Tried this;

def Rank Classifier

     if (perc_rank  <= 0.2):
               Rank Classifier = "Very Low"
            elif (perc_rank > 2) & (perc_rank <= 0.4):
                Rank Classifier = "Low"
            elif (perc_rank > 0.4) & (perc_rank  <= 0.6):
                Rank Classifier = "Medium"
            elif (perc_rank  > 0.6) & (perc_rank <= 0.8):
                Rank Classifier = "High"
            elif (perc_rank > 8) & (perc_rank <=1 ):
                Rank Classifier = "Very High"
                
        else:
            return Rank Classifier

Getting an error

IndentationError: unindent does not match any outer indentation level

Would like to return classifier data in new column called "Rank Classifier"

The output would look like the following:

Suburb  Percentile Rank  Rank Classifier
Hume        0.20464135      Very Low
Clayton     0.409162146     Low
Moorabin    0.654550934    Medium
St Kilda    0.80464135     High
Point Cook  1.505447257    Very High

norie · Accepted Answer

Instead of applying a function look at using pandas.cut.

The code below will give you the result you indicated you expected but you might need to tweak things.

bins = [0.2, 0.4, 0.6, 0.8, 1, np.inf]
labels = ['Very Low', 'Low', 'Medium', 'High', 'Very High']

df['Rank Classifier'] = pd.cut(df['Percentile Rank'], bins=bins, labels=labels)

Note, like I said the above will give you the desired output you indicated in the question.

However, I'm not sure that desired output is correct.

For example, shouldn't Hume be classified as Low rather than Very Low.

Also, how can Point Cook have a Percentile Rank of 1.505447257?

I think you need to check your criteria.

P.S. The bins list should really start at 0 and the last value should be 1.

bins = [0, 0.2, 0.4, 0.6, 0.8, 1]

How to create a new column with status based on value

Answers (2)

Related Questions