tamasgal
tamasgal

Reputation: 26319

Combine multiple columns into two columns: "column name" and "value"

There is probably an easy way of doing this, so I hope someone has a nice solution (currently I am doing it with ugly for loops).

My data looks like:

In [1]: df = pd.DataFrame({'Ref':  [5, 6, 7],
                           'Col1': [10,11,12],
                           'Col2': [20,21,22],
                           'Col3': [30,31,32]})

In [2]: df
Out[2]:
   Col1  Col2  Col3  Ref
0    10    20    30    5
1    11    21    31    6
2    12    22    32    7

And I am trying to flatten the table (for 2D histograms) to use a single column for the column id and one column for the actual values while keeping the corresponding Ref, like this:

   Ref  Col  Value
0    5    1    10
1    5    2    20
2    5    3    30
3    6    1    11
4    6    2    21
5    6    3    31
6    7    1    12
7    7    2    22
8    7    3    32

I remember there was some kind of a join/group operation to do the reverse operation, but I cannot recall it anymore...

Upvotes: 0

Views: 114

Answers (1)

jspring
jspring

Reputation: 71

Maybe not the most elegant solution, but it works on your data. Using a combination of pivot_table and stack.

import pandas as pd

df = pd.DataFrame({'Ref':  [5, 6, 7],
                           'Col1': [10,11,12],
                           'Col2': [20,21,22],
                           'Col3': [30,31,32]})
#    In [23]: df
#    Out[23]: 
#       Col1  Col2  Col3  Ref
#    0    10    20    30    5
#    1    11    21    31    6
#    2    12    22    32    7

piv = df.pivot_table(index=['Ref']).stack()
df2 = pd.DataFrame(piv)
df2.reset_index(inplace=True)
df2.columns = ['Ref','Col','Value']

#    In [19]: df2
#    Out[19]: 
#       Ref   Col  Value
#    0    5  Col1     10
#    1    5  Col2     20
#    2    5  Col3     30
#    3    6  Col1     11
#    4    6  Col2     21
#    5    6  Col3     31
#    6    7  Col1     12
#    7    7  Col2     22
#    8    7  Col3     32

If you want 'Col' to just be the last digit of the column name, could do something like this:

df2.Col = df2.Col.apply(lambda x: x[-1:])

#    In [21]: df2
#    Out[21]: 
#       Ref Col  Value
#    0    5   1     10
#    1    5   2     20
#    2    5   3     30
#    3    6   1     11
#    4    6   2     21
#    5    6   3     31
#    6    7   1     12
#    7    7   2     22
#    8    7   3     32

Upvotes: 1

Related Questions