Reputation: 43
I'm working with a publicly available election data set that I've imported into Pandas as a df:
fips_code county total_2008 dem_2008 gop_2008 oth_2008 total_2012 dem_2012 gop_2012 oth_2012 total_2016 dem_2016 gop_2016 oth_2016
0 26041 Delta County 19064 9974 8763 327 18043 8330 9533 180 18467 6431 11112 924
1 48295 Lipscomb County 1256 155 1093 8 1168 119 1044 5 1322 135 1159 28
2 1127 Walker County 28652 7420 20722 510 28497 6551 21633 313 29243 4486 24208 549
I would like to get to something like this:
fips_code county total dem gop oth year
26041 Delta County 19064 9974 8763 327 2008
48295 Lipscomb County 1256 155 1093 8 2008
1127 Walker County 28652 7420 20722 510 2008
I've searched around and found something similar, Stack and Pivot Dataframe in Python, but I can't figure out how to apply that to my question.
I managed to do a melt of the df:
In [86]:
df_melt = pd.melt(df, id_vars=['fips_code', 'county'], value_name='num_votes')
df_melt.head()
Out [86]:
fips_code county variable num_votes
0 26041 Delta County total_2008 19064
1 48295 Lipscomb County total_2008 1256
2 1127 Walker County total_2008 28652
3 48389 Reeves County total_2008 3077
4 56017 Hot Springs County total_2008 2546
This is where I get stuck because I can't figure out if it's a multi-step process that starts with using the melt, or if there's a straightforward way to get there from the initial df I have. I'm not even sure what functions I should be using, but it looks like it involves pivot/stack/unstack? Any help is much appreciated.
Upvotes: 4
Views: 547
Reputation: 28644
One option to abstract the reshaping process is with
pivot_longer from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
df.pivot_longer(
index = ['fips_code', 'county'],
names_to = ('.value', 'year'),
names_sep = '_')
fips_code county year total dem gop oth
0 26041 Delta County 2008 19064 9974 8763 327
1 48295 Lipscomb County 2008 1256 155 1093 8
2 1127 Walker County 2008 28652 7420 20722 510
3 26041 Delta County 2012 18043 8330 9533 180
4 48295 Lipscomb County 2012 1168 119 1044 5
5 1127 Walker County 2012 28497 6551 21633 313
6 26041 Delta County 2016 18467 6431 11112 924
7 48295 Lipscomb County 2016 1322 135 1159 28
8 1127 Walker County 2016 29243 4486 24208 549
No need to explicitly name the prefixes; the .value
identifies the prefixes, pulls them out, and sets them as headers, while the rest are transposed into the year
column. The dtype of the year column can also be changed to numeric efficiently, via the names_transform
parameter:
df.pivot_longer(
index = ['fips_code', 'county'],
names_to = ('.value', 'year'),
names_transform = {'year' : int},
names_sep = '_')
Upvotes: 0
Reputation: 323226
This is wide to long question again .wide_to_long
pd.wide_to_long(df,['total','dem','gop','oth'],i=['fips_code','county'],j='Year',sep='_').reset_index()
Out[28]:
fips_code county Year total dem gop oth
0 26041 DeltaCounty 2008 19064 9974 8763 327
1 26041 DeltaCounty 2012 18043 8330 9533 180
2 26041 DeltaCounty 2016 18467 6431 11112 924
3 48295 LipscombCounty 2008 1256 155 1093 8
4 48295 LipscombCounty 2012 1168 119 1044 5
5 48295 LipscombCounty 2016 1322 135 1159 28
6 1127 WalkerCounty 2008 28652 7420 20722 510
7 1127 WalkerCounty 2012 28497 6551 21633 313
8 1127 WalkerCounty 2016 29243 4486 24208 549
Upvotes: 4