Reputation: 469
I have the following pandas dataframe
Suburb Percentile Rank
Hume 0.20464135
Clayton 0.409162146
Moorabin 0.654550934
St Kilda 0.80464135
Point Cook 1.505447257
I want to create a new column called Rank classifier based on the "Percentile Rank" column value.
Rules would look like this;
perc_rank <= 0.2 then 'Very Low',
perc_rank > 0.2 and perc_rank <= 0.4 then 'Low',
perc_rank > 0.4 and perc_rank <= 0.6 then 'Medium',
perc_rank > 0.6 and perc_rank <= 0.8 then 'High',
perc_rank > 0.8 and perc_rank <= 1.0 then 'Very High'
I was able to produce Classifier output in SQL. But unable to do the same using python with creating a new column.
Tried this;
def Rank Classifier
if (perc_rank <= 0.2):
Rank Classifier = "Very Low"
elif (perc_rank > 2) & (perc_rank <= 0.4):
Rank Classifier = "Low"
elif (perc_rank > 0.4) & (perc_rank <= 0.6):
Rank Classifier = "Medium"
elif (perc_rank > 0.6) & (perc_rank <= 0.8):
Rank Classifier = "High"
elif (perc_rank > 8) & (perc_rank <=1 ):
Rank Classifier = "Very High"
else:
return Rank Classifier
Getting an error
IndentationError: unindent does not match any outer indentation level
Would like to return classifier data in new column called "Rank Classifier"
The output would look like the following:
Suburb Percentile Rank Rank Classifier
Hume 0.20464135 Very Low
Clayton 0.409162146 Low
Moorabin 0.654550934 Medium
St Kilda 0.80464135 High
Point Cook 1.505447257 Very High
Upvotes: 1
Views: 151
Reputation: 1486
try using apply
def RankClassifier(perc_rank):
if (perc_rank <= 0.2):
return "Very Low"
elif (perc_rank > 2) & (perc_rank <= 0.4):
return "Low"
elif (perc_rank > 0.4) & (perc_rank <= 0.6):
return "Medium"
elif (perc_rank > 0.6) & (perc_rank <= 0.8):
return "High"
elif (perc_rank > 8) & (perc_rank <=1 ):
return "Very High"
else:
return RankClassifier
df['Rank Classifier']= df['Percentile Rank'].apply(Classifier)
Upvotes: 1
Reputation: 9857
Instead of applying a function look at using pandas.cut.
The code below will give you the result you indicated you expected but you might need to tweak things.
bins = [0.2, 0.4, 0.6, 0.8, 1, np.inf]
labels = ['Very Low', 'Low', 'Medium', 'High', 'Very High']
df['Rank Classifier'] = pd.cut(df['Percentile Rank'], bins=bins, labels=labels)
Note, like I said the above will give you the desired output you indicated in the question.
However, I'm not sure that desired output is correct.
For example, shouldn't Hume
be classified as Low
rather than Very Low
.
Also, how can Point Cook
have a Percentile Rank
of 1.505447257?
I think you need to check your criteria.
P.S. The bins list should really start at 0 and the last value should be 1.
bins = [0, 0.2, 0.4, 0.6, 0.8, 1]
Upvotes: 6