Collective Action
Collective Action

Reputation: 8009

Groupby aggregate and create new columns from row cells

I have the following code I'm working with in JuypterNotebook:

import pandas as pd
import numpy as np

    url = 'https://raw.githubusercontent.com/dothemathonthatone/maps/master/fruchtbarkeit.csv'
fruchtdf = pd.read_csv(url)


fruchtdf = fruchtdf.set_axis(['year', 'regional_schlüssel', 'kreis_frei', 'gender', 'nationality', 'u15', '15', '16', '17', '18', '19', '20',
 '21', '22', '23', '24', '25', '26', '27', '28', '29', '30','31', '32', '33', '34', '35', '47', '48', '49', 'Ü5',
 '38', '39', '36', '37', '40', '41', '42', '43', '44', '45', '46', 'uknwn'], axis=1, inplace=False)

fruchtdf['15']= fruchtdf['u15']+fruchtdf['15']

fruchtdf.drop(['u15'], axis=1, inplace=True)



      year  regional_schlüssel  kreis_frei       gender     nationality     15  16  17  18  19  ...     36  37  40  41  42  43  44  45  46  uknwn
0   2000    5111000     Düsseldorf, krfr. Stadt     man     Deutsche    --1     7   9   20  24  ...     13  1   -   1   1   -   -   -   -   -
1   2000    5111000     Düsseldorf, krfr. Stadt     man     Ausländerin     ---     3   3   7   17  ...     4   3   1   -   -   -   -   -   1   -
2   2000    5111000     Düsseldorf, krfr. Stadt     woman   Deutsche    --1     4   7   14  20  ...     9   4   3   2   1   -   -   -   -   -
3   2000    5111000     Düsseldorf, krfr. Stadt     woman   Ausländerin     ---     1   5   10  17  ...     2   4   1   1   1   -   -   -   -   -
4   2000    5111000     Düsseldorf, krfr. Stadt     man     Deutsche    --1     9   14  30  45  ...     3   1   -   -   -   -   -   -   -   -

I am trying to aggregate columns 15 to unknown grouping by nationality, year, and regioinal_schlüssel

year regional_schlüssel nationality gender 15 16 17 ... unknown 2000 5111000 Deutsche man 1 4 4 7 2000 5111000 Deutsche woman 1 4 4 3 2000 5111000 Auslande man 1 4 4 7 2000 5111000 Auslande woman 1 4 4 3

desired output:

year regional_schlüssel nationality gender 15 16 17 ... unknown 2000 5111000 Deutsche man 2 8 8 10 2000 5111000 Auslande man 2 8 8 10

Then I would like to make 2 new sets of columns for each nationality: De15, De16, ..., unknown, and Aus15, Aus16, ..., Ausunknown

year regional_schlüssel nationality gender De15 De16 De17 ... Deunknown Aus15 Aus16 Aus17 Ausunknown 2000 5111000 Deutsche man 2 8 8 10 2 8 8 10

Is this possible?

Upvotes: 1

Views: 30

Answers (1)

jezrael
jezrael

Reputation: 863701

Convert all columns from 5th column to numeric, if no numeric is created missing value:

fruchtdf.iloc[:, 5:] = fruchtdf.iloc[:, 5:].apply(pd.to_numeric, errors='coerce')

Then aggregate sum:

sumcol = ['year', 'regional_schlüssel','nationality', 'gender']
df = fruchtdf.groupby(sumcol).sum()

For first DataFrame convert MultiIndex to columns by reset_index():

df1 = df.reset_index()
print (df1)
       year  regional_schlüssel  nationality gender   15    16    17    18  \
0      2000             5111000  Ausländerin    man  0.0   4.0  10.0  26.0   
1      2000             5111000  Ausländerin  woman  0.0   4.0  10.0  30.0   
2      2000             5111000     Deutsche    man -2.0  16.0  23.0  50.0   
3      2000             5111000     Deutsche  woman -2.0   9.0  22.0  39.0   
4      2000             5113000  Ausländerin    man  0.0   1.0   7.0  11.0   
    ...                 ...          ...    ...  ...   ...   ...   ...   
29155  2017             5978036     Deutsche  woman  0.0   0.0   0.0   1.0   
29156  2017             5978040  Ausländerin    man  0.0   0.0   0.0   0.0   
29157  2017             5978040  Ausländerin  woman  0.0   0.0   0.0   0.0   
29158  2017             5978040     Deutsche    man -1.0   1.0   1.0   1.0   
29159  2017             5978040     Deutsche  woman  0.0   0.0   1.0   2.0   

         19    20  ...    36   37   40   41   42   43   44   45   46  uknwn  
0      44.0  68.0  ...   5.0  3.0  2.0  1.0  1.0  1.0  0.0  0.0  1.0    0.0  
1      45.0  66.0  ...   5.0  5.0  1.0  1.0  1.0  0.0  0.0  0.0  0.0    0.0  
2      69.0  75.0  ...  16.0  2.0  0.0  1.0  1.0  0.0  0.0  0.0  0.0    0.0  
3      54.0  82.0  ...  15.0  5.0  5.0  2.0  1.0  0.0  0.0  0.0  0.0    0.0  
4      20.0  22.0  ...   2.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0    0.0  
    ...   ...  ...   ...  ...  ...  ...  ...  ...  ...  ...  ...    ...  
29155   0.0   0.0  ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0    0.0  
29156   0.0   0.0  ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0    0.0  
29157   0.0   2.0  ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0    0.0  
29158   2.0   3.0  ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0  0.0    0.0  
29159   0.0   0.0  ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0    0.0  

[29160 rows x 41 columns]

And for second reshape by Series.unstack and convert first letter of nationality to columns names by f-strings:

df2 = df.unstack(2)
df2.columns = df2.columns.map(lambda x: f'{x[1][:3]}{x[0]}')
df2 = df2.reset_index()
print (df2)
       year  regional_schlüssel gender  Aus15  Deu15  Aus16  Deu16  Aus17  \
0      2000             5111000    man    0.0   -2.0    4.0   16.0   10.0   
1      2000             5111000  woman    0.0   -2.0    4.0    9.0   10.0   
2      2000             5113000    man    0.0   -1.0    1.0    8.0    7.0   
3      2000             5113000  woman   -1.0    0.0    3.0    6.0    6.0   
4      2000             5114000    man    0.0    0.0    0.0    2.0    0.0   
    ...                 ...    ...    ...    ...    ...    ...    ...   
14575  2017             5978032  woman    0.0    0.0    0.0    0.0    0.0   
14576  2017             5978036    man    0.0   -2.0    0.0    0.0    1.0   
14577  2017             5978036  woman    0.0    0.0    0.0    0.0    0.0   
14578  2017             5978040    man    0.0   -1.0    0.0    1.0    0.0   
14579  2017             5978040  woman    0.0    0.0    0.0    0.0    0.0   

       Deu17  Aus18  ...  Aus43  Deu43  Aus44  Deu44  Aus45  Deu45  Aus46  \
0       23.0   26.0  ...    1.0    0.0    0.0    0.0    0.0    0.0    1.0   
1       22.0   30.0  ...    0.0    0.0    0.0    0.0    0.0    0.0    0.0   
2       16.0   11.0  ...    0.0    0.0    0.0    0.0    0.0    0.0    0.0   
3       17.0    8.0  ...    0.0    0.0    0.0    0.0    0.0    0.0    0.0   
4        4.0    5.0  ...    0.0    0.0    0.0    0.0    0.0    0.0    0.0   
     ...    ...  ...    ...    ...    ...    ...    ...    ...    ...   
14575    0.0    1.0  ...    0.0    0.0    0.0    0.0    0.0    0.0    0.0   
14576    1.0    2.0  ...    0.0    0.0    0.0    0.0    0.0    0.0    0.0   
14577    0.0    2.0  ...    0.0    0.0    0.0    0.0    0.0    0.0    0.0   
14578    1.0    0.0  ...    0.0    0.0    0.0    0.0    0.0    1.0    0.0   
14579    1.0    0.0  ...    0.0    0.0    0.0    0.0    0.0    0.0    0.0   

       Deu46  Ausuknwn  Deuuknwn  
0        0.0       0.0       0.0  
1        0.0       0.0       0.0  
2        0.0       0.0       0.0  
3        0.0       0.0       0.0  
4        0.0       0.0       0.0  
     ...       ...       ...  
14575    0.0       0.0       0.0  
14576    0.0       0.0       0.0  
14577    0.0       0.0       0.0  
14578    0.0       0.0       0.0  
14579    0.0       0.0       0.0  

[14580 rows x 77 columns]

Upvotes: 1

Related Questions