Reputation: 8996
I'm familiar with the standard Pandas pivot functionality, but what I'm looking to do is a little different - and I'm not married to using pivot
.
Setup code:
import pandas as pd
import io
csvA = io.StringIO(u'''
month chicken_a chicken_b chicken_c turkey_a turkey_b turkey_c
1 10 20 30 1 2 3
2 11 22 33 101 202 303
''')
dfA = pd.read_csv(csvA, sep = '\t')
Which yields:
month chicken_a chicken_b chicken_c turkey_a turkey_b turkey_c
0 1 10 20 30 1 2 3
1 2 11 22 33 101 202 303
I want to partially pivot the '_a', '_b', and '_c'
to the month, but keep the 'chicken' and 'turkey' headers. The final result would look like this:
month chicken turkey
0 1a 10 1
1 1b 20 2
2 1c 30 3
3 2a 11 101
4 2b 22 202
5 2c 33 303
The '_a', '_b', '_c'
part will always be exactly that, and will be known ahead of time.
I could hack this via for
loops, but I'm wondering if there is a more pandanic way.
Upvotes: 2
Views: 395
Reputation: 281
pd.wide_to_long
works nicely in this case:
dfB = pd.wide_to_long(dfA,['chicken','turkey'],i='month',j='suf',suffix='_\w')
results in
chicken turkey
month suf
1 _a 10 1
2 _a 11 101
1 _b 20 2
2 _b 22 202
1 _c 30 3
2 _c 33 303
You can then concatenate the two indices into a single month
column
dfB.reset_index(inplace = True)
dfB['month'] = dfB.astype({'month':'str'}).month + dfB.suf.str[1]
dfB.drop(columns='suf', inplace = True)
The output is as desired
month chicken turkey
0 1a 10 1
1 2a 11 101
2 1b 20 2
3 2b 22 202
4 1c 30 3
5 2c 33 303
Upvotes: 1
Reputation: 29635
After set_index
the column month, you can change the column to MultiIndex
by splitting the name of the columns' names with '_'
. After using stack
, you just need to join
the MultiIndex
to simple Index
, followed by a reset_index
:
#first convert month column to str, for later join
dfA['month'] = dfA['month'].astype(str)
#set month as index
dfA = dfA.set_index('month')
#change the column to MultiIndex using split
dfA.columns = pd.MultiIndex.from_tuples([col.split('_') for col in dfA.columns])
# stack
dfA = dfA.stack()
# simple index from the MultiIndex
dfA.index = pd.Index([''.join(ind) for ind in dfA.index], name='month')
#reset index
dfA = dfA.reset_index()
and you get
print (dfA)
month chicken turkey
0 1a 10 1
1 1b 20 2
2 1c 30 3
3 2a 11 101
4 2b 22 202
5 2c 33 303
Upvotes: 1
Reputation: 8996
I found a solution that I don't love, and still feels a bit hacky, but there are no loops and it gets the job done. I'm definitely open to better solutions:
df = dfA.set_index('month').stack().reset_index()
df['month_type'] = df['month'].astype(str) + df['level_1'].str[-1:]
df['level_1'] = df['level_1'].str[:-2]
df = df.drop(['month'], axis = 1)
df = df.pivot(index = 'month_type', columns = 'level_1', values = 0)
df = pd.DataFrame(df.to_records())
Upvotes: 1