Jivan
Jivan

Reputation: 23068

Un-flatten rows in dataframe

With a dataframe like this:

date           user    sales_blue     sales_red
2019/06/01        A            30            50
2019/06/01        B            60            70
2019/06/02        A            25            35
2019/06/02        B            42            52

I'm trying to get to the following:

date           user    color    sales
2019/06/01        A     blue       30
2019/06/01        A      red       50
2019/06/01        B     blue       60
2019/06/01        B      red       70
2019/06/02        A     blue       25
2019/06/02        A      red       35
2019/06/02        B     blue       42
2019/06/02        B      red       52

This is fairly doable in a kind of "manual" way by using loops, but for performance reasons I'd be looking for a more efficient solution which doesn't involve looping in Python.

Upvotes: 1

Views: 434

Answers (2)

user3483203
user3483203

Reputation: 51155

When you want to use all of a column's name to identify a row, melt is usually the go-to. However, when you want to use a smaller part of a column's name, where the prefix is a constant, the generic, more user-friendly version of melt is called wide_to_long.

Not only will this cleanly handle prefixes, it can be scaled to handle multiple different prefixes.

pd.wide_to_long(
    df, stubnames='sales', i=['date', 'user'], j='color', suffix='\w+', sep='_'
).reset_index(-1)

                color  sales
user date
A    2019/06/01  blue     30
     2019/06/01   red     50
B    2019/06/01  blue     60
     2019/06/01   red     70
A    2019/06/02  blue     25
     2019/06/02   red     35
B    2019/06/02  blue     42
     2019/06/02   red     52

To generalize this behavior to melt more than a single column:

print(df)

         date user  sales_blue  sales_red  calls_blue  calls_red
0  2019/06/01    A          30         50           3          4
1  2019/06/01    B          60         70           1          2
2  2019/06/02    A          25         35           4          6
3  2019/06/02    B          42         52           5          7

pd.wide_to_long(
  df, stubnames=['sales', 'calls'], i=['date', 'user'], j='color', suffix='\w+', sep='_'
).reset_index(-1)

                color  sales  calls
date       user
2019/06/01 A     blue     30      3
           A      red     50      4
           B     blue     60      1
           B      red     70      2
2019/06/02 A     blue     25      4
           A      red     35      6
           B     blue     42      5
           B      red     52      7

Upvotes: 3

yatu
yatu

Reputation: 88236

You can use melt. And in order to remove the prefixes from the colors you can slice the strings with the str accessor. If you need the specified order just add .sort_values('date') as @anky suggests.

out = df.melt(id_vars=['date', 'user'], var_name='color', value_name='sales')
out['color'] = out.color.str[6:]

print(out)

       date    user color  sales
0  2019/06/01    A  blue     30
1  2019/06/01    B  blue     60
2  2019/06/02    A  blue     25
3  2019/06/02    B  blue     42
4  2019/06/01    A   red     50
5  2019/06/01    B   red     70
6  2019/06/02    A   red     35
7  2019/06/02    B   red     52

Upvotes: 3

Related Questions