MRTN
MRTN

Reputation: 221

Reshape dataframe using pandas melt to get two value columns

my data is in the following format

x = pd.DataFrame([
    {'date': '2011-01-01', 'col1': '1','col2': '5', 'A_Q': '1', 'A_W': 'aa', 'B_Q': '2', 'B_W': 'zz'},
    {'date': '2011-01-02', 'col1': '1','col2': '9', 'A_Q': '-1', 'A_W': 'bb', 'B_Q': '3', 'B_W': 'rr'},
    {'date': '2011-01-03', 'col1': '3','col2': '3', 'A_Q': '0', 'A_W': 'cc', 'B_Q': '4', 'B_W': 'vv'},
    {'date': '2011-02-04', 'col1': '4','col2': '1', 'A_Q': '3', 'A_W': 'dd', 'B_Q': '5', 'B_W': 'gg'},
])
    date      col1 col2 A_Q A_W  B_Q B_W
0   2011-01-01  1    5   1   aa   2  zz
1   2011-01-02  1    9   -1  bb   3  rr
2   2011-01-03  3    3   0   cc   4  vv
3   2011-02-04  4    1   3   dd   5  gg

I would like to reshape the dataframe using melt or similar functions, with two output value columns. Any ideas on how to do this without splitting the input array?


     date     col1 col2 VAR Q   W
0   2011-01-01  1   5   A   1   aa
1   2011-01-01  1   5   B   2   zz
2   2011-01-02  1   9   A   -1  bb
3   2011-01-02  1   9   B   3   rr
4   2011-01-03  3   3   A   0   cc
5   2011-01-03  3   3   B   4   vv
6   2011-01-04  4   1   A   3   dd
7   2011-01-04  4   1   B   5   gg

Upvotes: 1

Views: 124

Answers (2)

jezrael
jezrael

Reputation: 862591

First idea is create MultiIndex by split columns with _ and reshape by DataFrame.stack:

df = x.set_index(['date','col1', 'col2'])
df.columns = df.columns.str.split('_', expand=True)
df = df.stack(0).reset_index().rename(columns={'level_3':'VAR'})
print (df)
         date col1 col2 VAR   Q   W
0  2011-01-01    1    5   A   1  aa
1  2011-01-01    1    5   B   2  zz
2  2011-01-02    1    9   A  -1  bb
3  2011-01-02    1    9   B   3  rr
4  2011-01-03    3    3   A   0  cc
5  2011-01-03    3    3   B   4  vv
6  2011-02-04    4    1   A   3  dd
7  2011-02-04    4    1   B   5  gg

Or use wide_to_long with rshape stack and Series.unstack:

df = (pd.wide_to_long(x,stubnames=["A","B"],
                       i=['date','col1', 'col2'],
                       j="new", sep="_",suffix=".*")
        .stack()
        .unstack(-2)
        .reset_index()
        .rename(columns={'level_3':'VAR'}))
print (df)
new        date col1 col2 VAR   Q   W
0    2011-01-01    1    5   A   1  aa
1    2011-01-01    1    5   B   2  zz
2    2011-01-02    1    9   A  -1  bb
3    2011-01-02    1    9   B   3  rr
4    2011-01-03    3    3   A   0  cc
5    2011-01-03    3    3   B   4  vv
6    2011-02-04    4    1   A   3  dd
7    2011-02-04    4    1   B   5  gg

Or swap values with _ between first and last to last and first, so need only wide_to_long:

df1 = x.copy()
df1.columns = df1.columns.str.replace(r'(\w+)_(\w+)', r'\2_\1', regex=True)

#thank you sammywemmy for alternative
df1.columns = df1.columns.str.split('_').str[::-1].str.join('_')

df1 = pd.wide_to_long(df1,stubnames=["Q","W"],
                        i=['date','col1', 'col2'],
                        j="VAR", sep="_",suffix=".*").reset_index()
print (df1)
         date col1 col2 VAR   Q   W
0  2011-01-01    1    5   A   1  aa
1  2011-01-01    1    5   B   2  zz
2  2011-01-02    1    9   A  -1  bb
3  2011-01-02    1    9   B   3  rr
4  2011-01-03    3    3   A   0  cc
5  2011-01-03    3    3   B   4  vv
6  2011-02-04    4    1   A   3  dd
7  2011-02-04    4    1   B   5  gg

Upvotes: 3

sammywemmy
sammywemmy

Reputation: 28644

One option is with pivot_longer from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor
x.pivot_longer(index = ['date', 'col1', 'col2'], 
               names_to = ('VAR', '.value'), 
               names_sep='_', 
               sort_by_appearance=True)
 
         date col1 col2 VAR   Q   W
0  2011-01-01    1    5   A   1  aa
1  2011-01-01    1    5   B   2  zz
2  2011-01-02    1    9   A  -1  bb
3  2011-01-02    1    9   B   3  rr
4  2011-01-03    3    3   A   0  cc
5  2011-01-03    3    3   B   4  vv
6  2011-02-04    4    1   A   3  dd
7  2011-02-04    4    1   B   5  gg

Upvotes: 1

Related Questions