Un-flatten rows in dataframe

Question

With a dataframe like this:

date           user    sales_blue     sales_red
2019/06/01        A            30            50
2019/06/01        B            60            70
2019/06/02        A            25            35
2019/06/02        B            42            52

I'm trying to get to the following:

date           user    color    sales
2019/06/01        A     blue       30
2019/06/01        A      red       50
2019/06/01        B     blue       60
2019/06/01        B      red       70
2019/06/02        A     blue       25
2019/06/02        A      red       35
2019/06/02        B     blue       42
2019/06/02        B      red       52

This is fairly doable in a kind of "manual" way by using loops, but for performance reasons I'd be looking for a more efficient solution which doesn't involve looping in Python.

user3483203 · Accepted Answer

When you want to use all of a column's name to identify a row, melt is usually the go-to. However, when you want to use a smaller part of a column's name, where the prefix is a constant, the generic, more user-friendly version of melt is called wide_to_long.

Not only will this cleanly handle prefixes, it can be scaled to handle multiple different prefixes.

pd.wide_to_long(
    df, stubnames='sales', i=['date', 'user'], j='color', suffix='\w+', sep='_'
).reset_index(-1)

                color  sales
user date
A    2019/06/01  blue     30
     2019/06/01   red     50
B    2019/06/01  blue     60
     2019/06/01   red     70
A    2019/06/02  blue     25
     2019/06/02   red     35
B    2019/06/02  blue     42
     2019/06/02   red     52

To generalize this behavior to melt more than a single column:

print(df)

         date user  sales_blue  sales_red  calls_blue  calls_red
0  2019/06/01    A          30         50           3          4
1  2019/06/01    B          60         70           1          2
2  2019/06/02    A          25         35           4          6
3  2019/06/02    B          42         52           5          7

pd.wide_to_long(
  df, stubnames=['sales', 'calls'], i=['date', 'user'], j='color', suffix='\w+', sep='_'
).reset_index(-1)

                color  sales  calls
date       user
2019/06/01 A     blue     30      3
           A      red     50      4
           B     blue     60      1
           B      red     70      2
2019/06/02 A     blue     25      4
           A      red     35      6
           B     blue     42      5
           B      red     52      7

Un-flatten rows in dataframe

Answers (2)

Related Questions