kjo
kjo

Reputation: 35321

Can't find a way to unstack a dataframe

I'm deeply embarrassed to ask the question in this post, because I'm sure it is completely elementary. And yet I've spent the last two hours poring over the pandas docs, and trying everything under the sun, to no avail...

Consider, for example, a dataframe like this one:

     A   B   C   D      x      y      z
0   a0  b0  c0  d0  0.007  0.710  0.990
1   a0  b0  c0  d1  0.283  0.040  1.027
2   a0  b0  c1  d0  0.017  0.688  2.840
3   a0  b0  c1  d1  0.167  0.132  2.471
4   a0  b1  c0  d0  0.041  0.851  1.078
5   a0  b1  c0  d1  0.235  1.027  1.027
6   a0  b1  c1  d0  0.037  0.934  2.282
7   a0  b1  c1  d1  0.023  1.049  2.826
8   a1  b0  c0  d0  0.912  0.425  1.055
9   a1  b0  c0  d1  0.329  0.932  0.836
10  a1  b0  c1  d0  0.481  0.681  0.997
11  a1  b0  c1  d1  0.782  0.595  2.294
12  a1  b1  c0  d0  0.264  0.918  0.857
13  a1  b1  c0  d1  0.053  1.001  0.920
14  a1  b1  c1  d0  1.161  1.090  1.470
15  a1  b1  c1  d1  0.130  0.992  2.121

Note that each combination of distinct values for the columns A, B, C, and D occurs exactly once in this dataframe. Hence, one can think of this subset of columns as the "key columns", and the remaining columns as the "value columns"1.

I want to produce a new frame from this one, by "unstacking" column C. By this I mean that, for each distinct combination of values of the remaining "key columns" (A, B, and D), the values of the C column give rise to corresponding "value columns" in the new frame.

For example, if we disregard columns x and y for the moment, the unstacking operation I mentioned above would produce the dataframe:

    A   B   D     c0     c1
0  a0  b0  d0  0.990  2.840
1  a0  b0  d1  1.027  2.471
2  a0  b1  d0  1.078  2.282
3  a0  b1  d1  1.027  2.826
4  a1  b0  d0  1.055  0.997
5  a1  b0  d1  0.836  2.294
6  a1  b1  d0  0.857  1.470
7  a1  b1  d1  0.920  2.121

Here the values of the new columns c0 and c1 are obtained from the values (in the original dataframe) of column z, for the corresponding unique combinations of values for columns A, B, and D.

How can I do this?

It'd be nice to be able to do a similar unstacking operation with the original frame (i.e. the one with all value columns x, y, and z), either using some form of hierarchical column organization in the new frame (i.e. where now columns x, y, and z would each consist, in turn, of "subcolums" c0 and c1), or perhaps some scheme for naming the new columns using composite names (e.g. in the new frame columns x, y, and z would be replaced by columns columns x_c0, x_c1, y_c0, y_c1, z_c0, and z_c1, etc.), but this may be hoping for too much...

PS: Here's the data for the dataframe above, in TSV format:

A   B       C       D       x       y       z
a0  b0      c0      d0      0.007   0.71    0.99
a0  b0      c0      d1      0.283   0.04    1.027
a0  b0      c1      d0      0.017   0.688   2.84
a0  b0      c1      d1      0.167   0.132   2.471
a0  b1      c0      d0      0.041   0.851   1.078
a0  b1      c0      d1      0.235   1.027   1.027
a0  b1      c1      d0      0.037   0.934   2.282
a0  b1      c1      d1      0.023   1.049   2.826
a1  b0      c0      d0      0.912   0.425   1.055
a1  b0      c0      d1      0.329   0.932   0.836
a1  b0      c1      d0      0.481   0.681   0.997
a1  b0      c1      d1      0.782   0.595   2.294
a1  b1      c0      d0      0.264   0.918   0.857
a1  b1      c0      d1      0.053   1.001   0.92
a1  b1      c1      d0      1.161   1.09    1.47
a1  b1      c1      d1      0.13    0.992   2.121

1More explicitly, the dataframe is isomorphic to a dict whose keys are the 4-tuples ('a0', 'b0', 'c0', 'd0'), ('a0', 'b0', 'c0', 'd1'), ..., ('a1', 'b1', 'c1', 'd1'), and whose values are the 3-tuples (0.007, 0.710, 0.990), (0.283, 0.040, 1.027), ..., (0.130, 0.992, 2.121). One of the many things I tried was to find a way to index the dataframe by unique combinations of the "key columns". Same story: this should be "hello-world"-level stuff in pandas, but I was not able to find how to do it in the documentation. Clearly my brain and the brain of whoever wrote the pandas docs are perfectly orthogonal to each other...

Upvotes: 2

Views: 300

Answers (1)

DSM
DSM

Reputation: 353369

It'd be nice to be able to do a similar unstacking operation with the original frame (i.e. the one with all value columns x, y, and z), either using some form of hierarchical column organization in the new frame (i.e. where now columns x, y, and z would each consist, in turn, of "subcolums" c0 and c1)

How about:

>>> df.pivot_table(values=["x","y","z"],rows=["A","B","D"], cols="C")
              x             y             z       
C            c0     c1     c0     c1     c0     c1
A  B  D                                           
a0 b0 d0  0.007  0.017  0.710  0.688  0.990  2.840
      d1  0.283  0.167  0.040  0.132  1.027  2.471
   b1 d0  0.041  0.037  0.851  0.934  1.078  2.282
      d1  0.235  0.023  1.027  1.049  1.027  2.826
a1 b0 d0  0.912  0.481  0.425  0.681  1.055  0.997
      d1  0.329  0.782  0.932  0.595  0.836  2.294
   b1 d0  0.264  1.161  0.918  1.090  0.857  1.470
      d1  0.053  0.130  1.001  0.992  0.920  2.121

See here, and note that the default aggregation function is mean. If you know that you don't have more than one value going to the same cell, as here, it doesn't matter much, but I once got myself into trouble because I thought it was sum.

Upvotes: 2

Related Questions