Reputation: 23068
With a dataframe like this:
date user sales_blue sales_red
2019/06/01 A 30 50
2019/06/01 B 60 70
2019/06/02 A 25 35
2019/06/02 B 42 52
I'm trying to get to the following:
date user color sales
2019/06/01 A blue 30
2019/06/01 A red 50
2019/06/01 B blue 60
2019/06/01 B red 70
2019/06/02 A blue 25
2019/06/02 A red 35
2019/06/02 B blue 42
2019/06/02 B red 52
This is fairly doable in a kind of "manual" way by using loops, but for performance reasons I'd be looking for a more efficient solution which doesn't involve looping in Python.
Upvotes: 1
Views: 434
Reputation: 51155
When you want to use all of a column's name to identify a row, melt
is usually the go-to. However, when you want to use a smaller part of a column's name, where the prefix is a constant, the generic, more user-friendly version of melt
is called wide_to_long
.
Not only will this cleanly handle prefixes, it can be scaled to handle multiple different prefixes.
pd.wide_to_long(
df, stubnames='sales', i=['date', 'user'], j='color', suffix='\w+', sep='_'
).reset_index(-1)
color sales
user date
A 2019/06/01 blue 30
2019/06/01 red 50
B 2019/06/01 blue 60
2019/06/01 red 70
A 2019/06/02 blue 25
2019/06/02 red 35
B 2019/06/02 blue 42
2019/06/02 red 52
To generalize this behavior to melt more than a single column:
print(df)
date user sales_blue sales_red calls_blue calls_red
0 2019/06/01 A 30 50 3 4
1 2019/06/01 B 60 70 1 2
2 2019/06/02 A 25 35 4 6
3 2019/06/02 B 42 52 5 7
pd.wide_to_long(
df, stubnames=['sales', 'calls'], i=['date', 'user'], j='color', suffix='\w+', sep='_'
).reset_index(-1)
color sales calls
date user
2019/06/01 A blue 30 3
A red 50 4
B blue 60 1
B red 70 2
2019/06/02 A blue 25 4
A red 35 6
B blue 42 5
B red 52 7
Upvotes: 3
Reputation: 88236
You can use melt
. And in order to remove the prefixes from the colors you can slice the strings with the str
accessor. If you need the specified order just add .sort_values('date')
as @anky suggests.
out = df.melt(id_vars=['date', 'user'], var_name='color', value_name='sales')
out['color'] = out.color.str[6:]
print(out)
date user color sales
0 2019/06/01 A blue 30
1 2019/06/01 B blue 60
2 2019/06/02 A blue 25
3 2019/06/02 B blue 42
4 2019/06/01 A red 50
5 2019/06/01 B red 70
6 2019/06/02 A red 35
7 2019/06/02 B red 52
Upvotes: 3