Reputation: 79
I have a DataFrame and I need to turn one column into multiple columns, and then create another column that index/labels values of the new/multiple columns
import pandas as pd
df = pd.DataFrame({'state':['AK','AK','AK','AK','AL','AL','AL','AL'], 'county':['Cnty1','Cnty1','Cnty2','Cnty2','Cnty3','Cnty3','Cnty4','Cnty4'],
'year':['2000','2001','2000','2001','2000','2001','2000','2001'], 'count1':[5,7,4,8,9,1,0,1], 'count2':[8,1,4,6,7,3,8,5]})
Using pivot_table()
and reset_index()
I'm able to move the values of year into columns, but not able to dis-aggregate it by the other columns.
Using: pivotDF = pd.pivot_table(df, index = ['state', 'county'], columns = 'year') pivotDF = pivotDF.reset_index()
Gets me close, but not what I need.
What I need is, another column that labels count1 and count2, with the values in the year columns. Something that looks like this:
I realize a DataFrame would have all the values for 'state' and 'county' filled in, which is fine, but I'm outputting this to Excel and need it to look just like this so if there's a way to have this format that would be a bonus.
Many thanks.
Upvotes: 0
Views: 670
Reputation: 5331
You've got most of the answer down. Just add a stack with level=0
to stack on that level rather than the default year level.
pd.pivot_table(df, index=['state', 'county'], columns='year', values=['count1', 'count2']) \
.stack(level=0)
Upvotes: 1
Reputation: 323226
You are looking for pivot
then stack
s=df.pivot_table(index=['state','county'],columns='year',values=['count1','count2'],aggfunc='mean').stack(level=0)
s
Out[142]:
year 2000 2001
state county
AK Cnty1 count1 5 7
count2 8 1
Cnty2 count1 4 8
count2 4 6
AL Cnty3 count1 9 1
count2 7 3
Cnty4 count1 0 1
count2 8 5
Upvotes: 2