Reputation: 13
Given the following dataframe in Pandas:
"Age","Gender","Impressions","Clicks","Signed_In"
36,0,3,0,1
73,1,3,0,1
30,0,3,0,1
49,1,3,0,1
47,1,11,0,1
I need to make a separate categorical variable (column) which holds the bin label for each row based on age. For instance, against the row -
36,0,3,0,1
I want another column to show 'Between 35 and 45'.
The final record should appear as -
36,0,3,0,1,'Between 35 and 45'
Upvotes: 0
Views: 2459
Reputation: 5591
You should create a sample set of data to help people answer your questions:
import pandas as pd
import numpy as np
d = {'Age' : [36, 73, 30, 49, 47],
'Gender' : [0, 1, 0, 1, 1],
'Impressions' : [3, 3, 3, 3, 11],
'Clicks' : [0, 0, 0, 0, 0],
'Signed_In' : [1, 1, 1, 1, 1]}
df = pd.DataFrame(d)
Makes it so people can just copy and paste easily instead of having to manually create your problem.
numpy's round function will round a negative decimal place:
df['Age_rounded'] = np.round(df['Age'], -1)
Age Clicks Gender Impressions Signed_In Age_rounded
0 36 0 0 3 1 40
1 73 0 1 3 1 70
2 30 0 0 3 1 30
3 49 0 1 3 1 50
4 47 0 1 11 1 50
You can then map a dictionary onto those values:
categories_dict = {30 : 'Between 25 and 35',
40 : 'Between 35 and 45',
50 : 'Between 45 and 55',
70 : 'Between 65 and 75'}
df['category'] = df['Age_rounded'].map(categories_dict)
Age Clicks Gender Impressions Signed_In Age_rounded category
0 36 0 0 3 1 40 Between 35 and 45
1 73 0 1 3 1 70 Between 65 and 75
2 30 0 0 3 1 30 Between 25 and 35
3 49 0 1 3 1 50 Between 45 and 55
4 47 0 1 11 1 50 Between 45 and 55
Upvotes: 3