Reputation: 4306
We start with this data:
import numpy as np
import pandas as pd
data=pd.DataFrame(data=np.random.rand(10,5),columns=['headA','headB','tailA','tailB','tailC'])
Now I want to perform a certain unstack operation which unstacks only the last three columns. Hence, this should be the new index:
pd.MultiIndex.from_product([data.columns[-3:],data.index])
MultiIndex([('tailA', 0),('tailA', 1),('tailA', 2),('tailA', 3),('tailA', 4),('tailA', 5),('tailA', 6),('tailA', 7),('tailA', 8),('tailA', 9),('tailB', 0),('tailB', 1),('tailB', 2),('tailB', 3),('tailB', 4),('tailB', 5),('tailB', 6),('tailB', 7),('tailB', 8),('tailB', 9),('tailC', 0),('tailC', 1),('tailC', 2),('tailC', 3),('tailC', 4),('tailC', 5),('tailC', 6),('tailC', 7),('tailC', 8),('tailC', 9)],)
I think, to perform this, I have to put the first two columns into another level than the remaining three columns. I don't know how I can do that in an elegant way. Any suggestions?
Upvotes: 1
Views: 35
Reputation: 262254
You could melt
with ignore_index=False
, then set the new variable column as index with set_index
append=True
:
out = (data.melt(data.columns[:-3], ignore_index=False, var_name='tail')
.set_index('tail', append=True).swaplevel()
)
Alternatively with a more classical stack
:
cols = data.columns[:-3].tolist()
out = (data.set_index(cols, append=True).stack()
.reset_index(cols, name='value')
.swaplevel()
#.sort_index() # only if order is important
)
Or with lreshape
:
cols = data.columns[-3:]
out = (pd.lreshape(data, {'value': cols})
.set_index(pd.MultiIndex.from_product([cols, data.index]))
)
Output:
headA headB value
tail
tailA 0 0.587533 0.472803 0.650371
1 0.465910 0.384050 0.623001
2 0.376165 0.724400 0.741483
3 0.589693 0.243686 0.833794
4 0.695016 0.808199 0.599436
5 0.976881 0.594713 0.315471
6 0.599184 0.263577 0.701009
7 0.524138 0.786585 0.975606
8 0.030385 0.903692 0.722221
9 0.175104 0.600267 0.830920
tailB 0 0.587533 0.472803 0.031079
1 0.465910 0.384050 0.693012
2 0.376165 0.724400 0.455640
3 0.589693 0.243686 0.331427
4 0.695016 0.808199 0.971165
5 0.976881 0.594713 0.277134
6 0.599184 0.263577 0.720298
7 0.524138 0.786585 0.786354
8 0.030385 0.903692 0.701858
9 0.175104 0.600267 0.854581
tailC 0 0.587533 0.472803 0.991319
1 0.465910 0.384050 0.167928
2 0.376165 0.724400 0.336915
3 0.589693 0.243686 0.193647
4 0.695016 0.808199 0.877337
5 0.976881 0.594713 0.026955
6 0.599184 0.263577 0.138335
7 0.524138 0.786585 0.727579
8 0.030385 0.903692 0.087960
9 0.175104 0.600267 0.578285
10 rows
# melt + set_index
2.01 ms ± 159 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# stack (without sort_index)
1.5 ms ± 48.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# lreshape
1.42 ms ± 35 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
100k rows
# melt + set_index
38.3 ms ± 2.76 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# stack (without sort_index)
12.6 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
# stack (with sort_index)
45.2 ms ± 2.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# lreshape
29.2 ms ± 479 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Upvotes: 1