Reputation: 8009
I have the following code I'm working with in JuypterNotebook:
import pandas as pd
import numpy as np
url = 'https://raw.githubusercontent.com/dothemathonthatone/maps/master/fruchtbarkeit.csv'
fruchtdf = pd.read_csv(url)
fruchtdf = fruchtdf.set_axis(['year', 'regional_schlüssel', 'kreis_frei', 'gender', 'nationality', 'u15', '15', '16', '17', '18', '19', '20',
'21', '22', '23', '24', '25', '26', '27', '28', '29', '30','31', '32', '33', '34', '35', '47', '48', '49', 'Ü5',
'38', '39', '36', '37', '40', '41', '42', '43', '44', '45', '46', 'uknwn'], axis=1, inplace=False)
fruchtdf['15']= fruchtdf['u15']+fruchtdf['15']
fruchtdf.drop(['u15'], axis=1, inplace=True)
year regional_schlüssel kreis_frei gender nationality 15 16 17 18 19 ... 36 37 40 41 42 43 44 45 46 uknwn
0 2000 5111000 Düsseldorf, krfr. Stadt man Deutsche --1 7 9 20 24 ... 13 1 - 1 1 - - - - -
1 2000 5111000 Düsseldorf, krfr. Stadt man Ausländerin --- 3 3 7 17 ... 4 3 1 - - - - - 1 -
2 2000 5111000 Düsseldorf, krfr. Stadt woman Deutsche --1 4 7 14 20 ... 9 4 3 2 1 - - - - -
3 2000 5111000 Düsseldorf, krfr. Stadt woman Ausländerin --- 1 5 10 17 ... 2 4 1 1 1 - - - - -
4 2000 5111000 Düsseldorf, krfr. Stadt man Deutsche --1 9 14 30 45 ... 3 1 - - - - - - - -
I am trying to aggregate columns 15
to unknown
grouping by nationality
, year
, and regioinal_schlüssel
year regional_schlüssel nationality gender 15 16 17 ... unknown
2000 5111000 Deutsche man 1 4 4 7
2000 5111000 Deutsche woman 1 4 4 3
2000 5111000 Auslande man 1 4 4 7
2000 5111000 Auslande woman 1 4 4 3
desired output:
year regional_schlüssel nationality gender 15 16 17 ... unknown
2000 5111000 Deutsche man 2 8 8 10
2000 5111000 Auslande man 2 8 8 10
Then I would like to make 2 new sets of columns for each nationality: De15
, De16
, ...
, unknown
, and Aus15
, Aus16
, ...
, Ausunknown
year regional_schlüssel nationality gender De15 De16 De17 ... Deunknown Aus15 Aus16 Aus17 Ausunknown
2000 5111000 Deutsche man 2 8 8 10 2 8 8 10
Is this possible?
Upvotes: 1
Views: 30
Reputation: 863701
Convert all columns from 5th column to numeric, if no numeric is created missing value:
fruchtdf.iloc[:, 5:] = fruchtdf.iloc[:, 5:].apply(pd.to_numeric, errors='coerce')
Then aggregate sum:
sumcol = ['year', 'regional_schlüssel','nationality', 'gender']
df = fruchtdf.groupby(sumcol).sum()
For first DataFrame convert MultiIndex to columns by reset_index()
:
df1 = df.reset_index()
print (df1)
year regional_schlüssel nationality gender 15 16 17 18 \
0 2000 5111000 Ausländerin man 0.0 4.0 10.0 26.0
1 2000 5111000 Ausländerin woman 0.0 4.0 10.0 30.0
2 2000 5111000 Deutsche man -2.0 16.0 23.0 50.0
3 2000 5111000 Deutsche woman -2.0 9.0 22.0 39.0
4 2000 5113000 Ausländerin man 0.0 1.0 7.0 11.0
... ... ... ... ... ... ... ...
29155 2017 5978036 Deutsche woman 0.0 0.0 0.0 1.0
29156 2017 5978040 Ausländerin man 0.0 0.0 0.0 0.0
29157 2017 5978040 Ausländerin woman 0.0 0.0 0.0 0.0
29158 2017 5978040 Deutsche man -1.0 1.0 1.0 1.0
29159 2017 5978040 Deutsche woman 0.0 0.0 1.0 2.0
19 20 ... 36 37 40 41 42 43 44 45 46 uknwn
0 44.0 68.0 ... 5.0 3.0 2.0 1.0 1.0 1.0 0.0 0.0 1.0 0.0
1 45.0 66.0 ... 5.0 5.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0
2 69.0 75.0 ... 16.0 2.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0
3 54.0 82.0 ... 15.0 5.0 5.0 2.0 1.0 0.0 0.0 0.0 0.0 0.0
4 20.0 22.0 ... 2.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ...
29155 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
29156 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
29157 0.0 2.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
29158 2.0 3.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
29159 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
[29160 rows x 41 columns]
And for second reshape by Series.unstack
and convert first letter of nationality
to columns names by f-string
s:
df2 = df.unstack(2)
df2.columns = df2.columns.map(lambda x: f'{x[1][:3]}{x[0]}')
df2 = df2.reset_index()
print (df2)
year regional_schlüssel gender Aus15 Deu15 Aus16 Deu16 Aus17 \
0 2000 5111000 man 0.0 -2.0 4.0 16.0 10.0
1 2000 5111000 woman 0.0 -2.0 4.0 9.0 10.0
2 2000 5113000 man 0.0 -1.0 1.0 8.0 7.0
3 2000 5113000 woman -1.0 0.0 3.0 6.0 6.0
4 2000 5114000 man 0.0 0.0 0.0 2.0 0.0
... ... ... ... ... ... ... ...
14575 2017 5978032 woman 0.0 0.0 0.0 0.0 0.0
14576 2017 5978036 man 0.0 -2.0 0.0 0.0 1.0
14577 2017 5978036 woman 0.0 0.0 0.0 0.0 0.0
14578 2017 5978040 man 0.0 -1.0 0.0 1.0 0.0
14579 2017 5978040 woman 0.0 0.0 0.0 0.0 0.0
Deu17 Aus18 ... Aus43 Deu43 Aus44 Deu44 Aus45 Deu45 Aus46 \
0 23.0 26.0 ... 1.0 0.0 0.0 0.0 0.0 0.0 1.0
1 22.0 30.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 16.0 11.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 17.0 8.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 4.0 5.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ...
14575 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
14576 1.0 2.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
14577 0.0 2.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
14578 1.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0
14579 1.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Deu46 Ausuknwn Deuuknwn
0 0.0 0.0 0.0
1 0.0 0.0 0.0
2 0.0 0.0 0.0
3 0.0 0.0 0.0
4 0.0 0.0 0.0
... ... ...
14575 0.0 0.0 0.0
14576 0.0 0.0 0.0
14577 0.0 0.0 0.0
14578 0.0 0.0 0.0
14579 0.0 0.0 0.0
[14580 rows x 77 columns]
Upvotes: 1