Reputation: 2871
I have a dataframe as show below.
ID Ownwer_ID Building Nationality Age Sector
1 2 Villa India 24 SE1
2 2 Villa India 28 SE1
3 4 Apartment USA 82 SE2
4 4 Apartment USA 68 SE2
5 7 Villa UK 32 SE2
6 7 Villa UK 28 SE2
7 7 Villa UK 4 SE2
8 8 LabourCamp Pakistan 27 SE3
9 2 Villa India 1 SE1
10 10 LabourCamp India 23 SE2
11 11 Apartment Germany 34 SE3
In the above data ID is unique, which represents a person.
From the above dataframe I would like prepare below dataframe
Sector #Age_0-12 #Agemore70 #Asians #Europe #USA #Asians_LabourCamp #USA_Apartment
SE1 1 0 3 0 0 0 0
SE2 1 1 1 3 2 1 2
SE3 0 0 1 1 0 1 0
where Asians I considered Nationality India or Pakistan. Europe = Nationality UK or Germany.
#Age_0-12 = Number of peoples having age between 0 to 12 (inclusive)
#Agemore70 = Number of peoples having age more than or eaqual to 70
similarly remaining all columns are number of people explained by their name.
I tried following code
d = {'India': 'Asians', 'Pakistan': 'Asians', 'UK': 'Europe', 'Germany': 'Europe',
'USA': 'USA'}
df['natinality_Group'] = df['Nationality'].map(d)
bins = [-1, , 12, , 21, 50, 100]
df['binned_age'] = pd.cut(df['Age'], bins)
After that I was clueless, can please help me if you have solution?
Upvotes: 2
Views: 759
Reputation: 153460
Let's try this, using pd.cut
to get age groups and pd.get_dummies
with groupby
to get counts of each value in the selected columns:
df['Age Group'] = pd.cut(df['Age'],[0,12,70,np.inf],labels=['Age_0-12','Age_12-70','Agemore70'])
df_out = pd.get_dummies(df[['Sector','Building', 'Age Group', 'Nationality']],
columns=['Age Group', 'Building', 'Nationality'],
prefix='#', prefix_sep='').groupby('Sector').sum()
Output:
#Age_0-12 #Age_12-70 #Agemore70 #Apartment #LabourCamp #Villa \
Sector
SE1 1 2 0 0 0 3
SE2 1 4 1 2 1 3
SE3 0 2 0 1 1 0
#Germany #India #Pakistan #UK #USA
Sector
SE1 0 3 0 0 0
SE2 0 1 0 3 2
SE3 1 0 1 0 0
Upvotes: 1