Reputation: 525
I have a big table with rows representing observations. A subset of my columns can be grouped by two meta-categories so I have represented these using a multiindex. However, The multiindex is not applicable to all columns. So all the remaining columns have index labels only in the first level.
I want to apply melt() to some of these multidimensional columns and then merge() them back with my main table later by an index column.
Here is a MRE of the part I'm stuck on. I have a data frame like this:
df = pd.DataFrame({
('INDEX',): [1,2,3],
('a','x'): ['ww','rt','pb'],
('a','y'): [88,97,12],
('b','x'): ['ew','tr','cv'],
('b','y'): [14,42,67],
('c','x'): ['wq','fg','dg'],
('c','y'): [65,78,46]})
df
INDEX a b c
NaN x y x y x y
0 1 ww 88 ew 14 wq 65
1 2 rt 97 tr 42 fg 78
2 3 pb 12 cv 67 dg 46
Now I'd like for this:
df.melt(id_vars=('INDEX',))
...to result in this:
INDEX thing_1 thing_2 value
1 a x ww
2 a x rt
3 a x pb
1 a y 88
2 a y 97
3 a y 12
1 b x ew
2 b x tr
3 b x cv
1 b y 14
2 b y 42
3 b y 67
1 c x wq
2 c x fg
3 c x dg
1 c y 65
2 c y 78
3 c y 46
But I get a long traceback ending in Exception: Data must be 1-dimensional
.
When I first started attempting this I was trying to supply value_vars
but was having no luck; I later realized that del df['INDEX']; df.melt()
works with no trouble except of course that I wanted to keep my INDEX column.
I have also tried some variants such as putting 'INDEX' in a 1-tuple, a 2-tuple ending in np.nan
and slice(None)
, a list, etc. I get the same complaint about 1-dimensional data, or for some cases, ValueError: id_vars must be a list of tuples when columns are a MultiIndex
.
What's the right way to go about this? I found a solution after a bunch of trial and error which I will post below, but it seems like it ought to have been simpler or more elegant. Thanks.
Upvotes: 2
Views: 1702
Reputation: 25239
You may use empty string as 2nd level of INDEX
df = pd.DataFrame({
('INDEX',''): [1,2,3],
('a','x'): ['ww','rt','pb'],
('a','y'): [88,97,12],
('b','x'): ['ew','tr','cv'],
('b','y'): [14,42,67],
('c','x'): ['wq','fg','dg'],
('c','y'): [65,78,46]})
In [198]: df
Out[198]:
INDEX a b c
x y x y x y
0 1 ww 88 ew 14 wq 65
1 2 rt 97 tr 42 fg 78
2 3 pb 12 cv 67 dg 46
df.melt(id_vars='INDEX')
Out[200]:
INDEX variable_0 variable_1 value
0 1 a x ww
1 2 a x rt
2 3 a x pb
3 1 a y 88
4 2 a y 97
5 3 a y 12
6 1 b x ew
7 2 b x tr
8 3 b x cv
9 1 b y 14
10 2 b y 42
11 3 b y 67
12 1 c x wq
13 2 c x fg
14 3 c x dg
15 1 c y 65
16 2 c y 78
17 3 c y 46
Upvotes: 2
Reputation: 525
After I began composing this question, I stumbled on a solution.
If I replace ('INDEX',)
with ('INDEX','foo')
in my DataFrame assignment, then I get the solution I want like this:
df.melt(id_vars=[('INDEX','foo')])
(INDEX, foo) variable_0 variable_1 value
0 1 a x ww
1 2 a x rt
2 3 a x pb
3 1 a y 88
4 2 a y 97
[...]
Upvotes: 0