Reputation: 1224
I wish to convert a data frame consisting of two columns.
Here is the sample df:
Output:
df:
cost numbers
1 360 23
2 120 35
3 2000 49
Both columns are float and I wish to convert them to categorical using binning. I wish to create the following bins for each column when converting to categorical.
Bins for the numbers : 18-24, 25-44, 45-65, 66-92
Bins for cost column: >=1000, <1000
Finally, I want to not create a new column but just convert the column without creating a new one. Here is my attempted code at this:
def PreprocessDataframe(df):
#use binning to convert age and budget to categorical columns
df['numbers'] = pd.cut(df['numbers'], bins=[18, 24, 25, 44, 45, 65, 66, 92])
df['cost'] = pd.cut(df['cost'], bins=['=>1000', '<1000'])
return df
I understand how to convert the "numbers" column but I am having trouble with the "cost" one. Help would be nice on how to solve this. Thanks in advance! Cheers!
Upvotes: 0
Views: 1286
Reputation: 1750
If you use bins=[18, 24, 25, 44, 45, 65, 66, 92]
, this is going to generate bins for 18-24, 24-25, 25-44, 44-45, etc... and you don't need the ones for 24-25, 44-45...
By default, the bins are from the first value (not incusive) to the last value inclusive.
So, for numbers
, you could use instead bins=[17, 24, 44, 65, 92]
(note the 17
at the first position, so 18 is included).
The optional parameter label
allows to choose labels for the bins.
df['numbers'] = pd.cut(df['numbers'], bins=[17, 24, 44, 65, 92], labels=['18-24', '25-44', '45-65', '66-92'])
df['cost'] = pd.cut(df['cost'], bins=[0, 999.99, df['cost'].max()], labels=['<1000', '=>1000'])
print(df)
>>> df
cost numbers
0 <1000 18-24
1 <1000 25-44
2 =>1000 45-65
Upvotes: 1