Pandas: "distribute" column values into multiple rows

Question

I am a pandas newbie, and I am trying to solve the following problem.

I have a large DataFrame (10000 x 28) as follows.

Col1 Col2 Col3 Col4 Col5
  A    B   C    D    E

How can I reshape it to look like the following?

Col1 Col2 NewColName
  A    B    C
  A    B    D
  A    B    E

The first two columns remain the same, and the last three columns get "distributed" over them.

My goal is to take a large DataFrame and turn it into "tidy data"

I have tried methods like concat, stack etc. but I feel I'm missing some elegant way to solve this.

Stefan · Accepted Answer

Set the ['Col1, 'Col2] as index and use .stack().

df.set_index(['Col1', 'Col2']).stack()

Col1  Col2   
A     B     0    C
            0    D
            0    E

Then do .reset_index() to format as in your example (you can also add name='Col' for the same result as suggested by @jezrael:

df.reset_index(-1, drop=True).reset_index(name='Col')

  Col1 Col2  0
0    A    B  C
1    A    B  D
2    A    B  E

Answers (2)