nivek
nivek

Reputation: 525

Melt a bunch of multiIndexed columns while keeping a single 'index' column

I have a big table with rows representing observations. A subset of my columns can be grouped by two meta-categories so I have represented these using a multiindex. However, The multiindex is not applicable to all columns. So all the remaining columns have index labels only in the first level.

I want to apply melt() to some of these multidimensional columns and then merge() them back with my main table later by an index column.

Here is a MRE of the part I'm stuck on. I have a data frame like this:

df = pd.DataFrame({
  ('INDEX',): [1,2,3],
  ('a','x'): ['ww','rt','pb'],
  ('a','y'): [88,97,12],
  ('b','x'): ['ew','tr','cv'],
  ('b','y'): [14,42,67],
  ('c','x'): ['wq','fg','dg'],
  ('c','y'): [65,78,46]})

df
  INDEX   a       b       c
    NaN   x   y   x   y   x   y
0     1  ww  88  ew  14  wq  65
1     2  rt  97  tr  42  fg  78
2     3  pb  12  cv  67  dg  46

Now I'd like for this:

df.melt(id_vars=('INDEX',))

...to result in this:

INDEX thing_1 thing_2 value
    1       a       x    ww
    2       a       x    rt
    3       a       x    pb
    1       a       y    88
    2       a       y    97
    3       a       y    12
    1       b       x    ew
    2       b       x    tr
    3       b       x    cv
    1       b       y    14
    2       b       y    42
    3       b       y    67
    1       c       x    wq
    2       c       x    fg
    3       c       x    dg
    1       c       y    65
    2       c       y    78
    3       c       y    46

But I get a long traceback ending in Exception: Data must be 1-dimensional.

When I first started attempting this I was trying to supply value_vars but was having no luck; I later realized that del df['INDEX']; df.melt() works with no trouble except of course that I wanted to keep my INDEX column.

I have also tried some variants such as putting 'INDEX' in a 1-tuple, a 2-tuple ending in np.nan and slice(None), a list, etc. I get the same complaint about 1-dimensional data, or for some cases, ValueError: id_vars must be a list of tuples when columns are a MultiIndex.

What's the right way to go about this? I found a solution after a bunch of trial and error which I will post below, but it seems like it ought to have been simpler or more elegant. Thanks.

Upvotes: 2

Views: 1702

Answers (2)

Andy L.
Andy L.

Reputation: 25239

You may use empty string as 2nd level of INDEX

df = pd.DataFrame({
  ('INDEX',''): [1,2,3],
  ('a','x'): ['ww','rt','pb'],
  ('a','y'): [88,97,12],
  ('b','x'): ['ew','tr','cv'],
  ('b','y'): [14,42,67],
  ('c','x'): ['wq','fg','dg'],
  ('c','y'): [65,78,46]})

In [198]: df
Out[198]:
  INDEX   a       b       c
          x   y   x   y   x   y
0     1  ww  88  ew  14  wq  65
1     2  rt  97  tr  42  fg  78
2     3  pb  12  cv  67  dg  46

df.melt(id_vars='INDEX')

Out[200]:
    INDEX variable_0 variable_1 value
0       1          a          x    ww
1       2          a          x    rt
2       3          a          x    pb
3       1          a          y    88
4       2          a          y    97
5       3          a          y    12
6       1          b          x    ew
7       2          b          x    tr
8       3          b          x    cv
9       1          b          y    14
10      2          b          y    42
11      3          b          y    67
12      1          c          x    wq
13      2          c          x    fg
14      3          c          x    dg
15      1          c          y    65
16      2          c          y    78
17      3          c          y    46

Upvotes: 2

nivek
nivek

Reputation: 525

After I began composing this question, I stumbled on a solution.

If I replace ('INDEX',) with ('INDEX','foo') in my DataFrame assignment, then I get the solution I want like this:

df.melt(id_vars=[('INDEX','foo')])

    (INDEX, foo) variable_0 variable_1 value
0              1          a          x    ww
1              2          a          x    rt
2              3          a          x    pb
3              1          a          y    88
4              2          a          y    97
[...]

Upvotes: 0

Related Questions