elPastor
elPastor

Reputation: 8996

Pandas: Partial pivot based on header substring

I'm familiar with the standard Pandas pivot functionality, but what I'm looking to do is a little different - and I'm not married to using pivot.

Setup code:

import pandas as pd
import io

csvA = io.StringIO(u'''
month   chicken_a   chicken_b   chicken_c   turkey_a    turkey_b    turkey_c
1   10  20  30  1   2   3
2   11  22  33  101 202 303
''')

dfA = pd.read_csv(csvA, sep = '\t')

Which yields:

   month  chicken_a  chicken_b  chicken_c  turkey_a  turkey_b  turkey_c
0      1         10         20         30         1         2         3
1      2         11         22         33       101       202       303

I want to partially pivot the '_a', '_b', and '_c' to the month, but keep the 'chicken' and 'turkey' headers. The final result would look like this:

  month  chicken  turkey
0    1a       10       1
1    1b       20       2
2    1c       30       3
3    2a       11     101
4    2b       22     202
5    2c       33     303

The '_a', '_b', '_c' part will always be exactly that, and will be known ahead of time.

I could hack this via for loops, but I'm wondering if there is a more pandanic way.

Upvotes: 2

Views: 395

Answers (3)

onietosi
onietosi

Reputation: 281

pd.wide_to_long works nicely in this case:

dfB = pd.wide_to_long(dfA,['chicken','turkey'],i='month',j='suf',suffix='_\w')

results in

           chicken  turkey
month suf                 
1     _a        10       1
2     _a        11     101
1     _b        20       2
2     _b        22     202
1     _c        30       3
2     _c        33     303

You can then concatenate the two indices into a single month column

dfB.reset_index(inplace = True)
dfB['month'] = dfB.astype({'month':'str'}).month + dfB.suf.str[1]
dfB.drop(columns='suf', inplace = True)

The output is as desired

  month  chicken  turkey
0   1a       10       1
1   2a       11     101
2   1b       20       2
3   2b       22     202
4   1c       30       3
5   2c       33     303

Upvotes: 1

Ben.T
Ben.T

Reputation: 29635

After set_index the column month, you can change the column to MultiIndex by splitting the name of the columns' names with '_'. After using stack, you just need to join the MultiIndex to simple Index, followed by a reset_index:

#first convert month column to str, for later join
dfA['month'] = dfA['month'].astype(str)
#set month as index
dfA = dfA.set_index('month')
#change the column to MultiIndex using split
dfA.columns = pd.MultiIndex.from_tuples([col.split('_') for col in dfA.columns])
# stack
dfA = dfA.stack()
# simple index from the MultiIndex
dfA.index = pd.Index([''.join(ind) for ind in dfA.index], name='month')
#reset index
dfA = dfA.reset_index()

and you get

print (dfA)
  month  chicken  turkey
0    1a       10       1
1    1b       20       2
2    1c       30       3
3    2a       11     101
4    2b       22     202
5    2c       33     303

Upvotes: 1

elPastor
elPastor

Reputation: 8996

I found a solution that I don't love, and still feels a bit hacky, but there are no loops and it gets the job done. I'm definitely open to better solutions:

df  = dfA.set_index('month').stack().reset_index()

df['month_type']    = df['month'].astype(str) + df['level_1'].str[-1:]
df['level_1']       = df['level_1'].str[:-2]

df  = df.drop(['month'], axis = 1)
df  = df.pivot(index = 'month_type', columns = 'level_1', values = 0)
df  = pd.DataFrame(df.to_records())

Upvotes: 1

Related Questions