Reputation: 26319
There is probably an easy way of doing this, so I hope someone has a nice solution (currently I am doing it with ugly for loops).
My data looks like:
In [1]: df = pd.DataFrame({'Ref': [5, 6, 7],
'Col1': [10,11,12],
'Col2': [20,21,22],
'Col3': [30,31,32]})
In [2]: df
Out[2]:
Col1 Col2 Col3 Ref
0 10 20 30 5
1 11 21 31 6
2 12 22 32 7
And I am trying to flatten the table (for 2D histograms) to use a single column for the column id and one column for the actual values while keeping the corresponding Ref
, like this:
Ref Col Value
0 5 1 10
1 5 2 20
2 5 3 30
3 6 1 11
4 6 2 21
5 6 3 31
6 7 1 12
7 7 2 22
8 7 3 32
I remember there was some kind of a join/group operation to do the reverse operation, but I cannot recall it anymore...
Upvotes: 0
Views: 114
Reputation: 71
Maybe not the most elegant solution, but it works on your data. Using a combination of pivot_table and stack.
import pandas as pd
df = pd.DataFrame({'Ref': [5, 6, 7],
'Col1': [10,11,12],
'Col2': [20,21,22],
'Col3': [30,31,32]})
# In [23]: df
# Out[23]:
# Col1 Col2 Col3 Ref
# 0 10 20 30 5
# 1 11 21 31 6
# 2 12 22 32 7
piv = df.pivot_table(index=['Ref']).stack()
df2 = pd.DataFrame(piv)
df2.reset_index(inplace=True)
df2.columns = ['Ref','Col','Value']
# In [19]: df2
# Out[19]:
# Ref Col Value
# 0 5 Col1 10
# 1 5 Col2 20
# 2 5 Col3 30
# 3 6 Col1 11
# 4 6 Col2 21
# 5 6 Col3 31
# 6 7 Col1 12
# 7 7 Col2 22
# 8 7 Col3 32
If you want 'Col' to just be the last digit of the column name, could do something like this:
df2.Col = df2.Col.apply(lambda x: x[-1:])
# In [21]: df2
# Out[21]:
# Ref Col Value
# 0 5 1 10
# 1 5 2 20
# 2 5 3 30
# 3 6 1 11
# 4 6 2 21
# 5 6 3 31
# 6 7 1 12
# 7 7 2 22
# 8 7 3 32
Upvotes: 1