Danish
Danish

Reputation: 2871

Groupby count on multiple condition and multiple columns pandas

I have a dataframe as show below.

ID  Ownwer_ID   Building   Nationality  Age   Sector
1   2           Villa      India        24    SE1
2   2           Villa      India        28    SE1
3   4           Apartment  USA          82    SE2
4   4           Apartment  USA          68    SE2
5   7           Villa      UK           32    SE2
6   7           Villa      UK           28    SE2
7   7           Villa      UK            4    SE2
8   8           LabourCamp Pakistan     27    SE3
9   2           Villa      India        1     SE1
10  10          LabourCamp India        23    SE2
11  11          Apartment  Germany      34    SE3

In the above data ID is unique, which represents a person.

From the above dataframe I would like prepare below dataframe

Sector   #Age_0-12  #Agemore70   #Asians  #Europe  #USA  #Asians_LabourCamp #USA_Apartment
SE1      1          0            3        0        0     0                  0
SE2      1          1            1        3        2     1                  2
SE3      0          0            1        1        0     1                  0

where Asians I considered Nationality India or Pakistan. Europe = Nationality UK or Germany.

#Age_0-12 = Number of peoples having age between 0 to 12 (inclusive)

#Agemore70 = Number of peoples having age more than or eaqual to 70

similarly remaining all columns are number of people explained by their name.

I tried following code

d = {'India': 'Asians', 'Pakistan': 'Asians', 'UK': 'Europe', 'Germany': 'Europe',
'USA': 'USA'}
df['natinality_Group'] = df['Nationality'].map(d)

bins = [-1, , 12, , 21, 50, 100]
df['binned_age'] = pd.cut(df['Age'], bins)

After that I was clueless, can please help me if you have solution?

Upvotes: 2

Views: 759

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

Let's try this, using pd.cut to get age groups and pd.get_dummies with groupby to get counts of each value in the selected columns:

df['Age Group'] = pd.cut(df['Age'],[0,12,70,np.inf],labels=['Age_0-12','Age_12-70','Agemore70'])


df_out = pd.get_dummies(df[['Sector','Building', 'Age Group', 'Nationality']], 
                        columns=['Age Group', 'Building', 'Nationality'], 
                        prefix='#', prefix_sep='').groupby('Sector').sum()

Output:

       #Age_0-12  #Age_12-70  #Agemore70  #Apartment  #LabourCamp  #Villa  \
Sector                                                                       
SE1             1           2           0           0            0       3   
SE2             1           4           1           2            1       3   
SE3             0           2           0           1            1       0   

        #Germany  #India  #Pakistan  #UK  #USA  
Sector                                          
SE1            0       3          0    0     0  
SE2            0       1          0    3     2  
SE3            1       0          1    0     0  

Upvotes: 1

Related Questions