NMSD
NMSD

Reputation: 494

Pandas Flatten a dataframe to a single column

I have dataset in the following format:

df = pd.DataFrame({'x':[1,2,3], 'y':[10,20,30], 'v1':[3,2,3] , 'v2':[13,25,31] })

>> v1 v2  x   y
   3  13  1  10
   2  25  2  20
   3  31  3  30

Setting the index column with x, I want to flatten the data combining v1 and v2 (V), The expected output is like:

>> x   y   V
   1  10   3
   1  10   13
   2  20   2
   2  20   25
   3  30   3
   3  30   31

And again bringing to the original format of df. I tried reshaping using stack and unstack, but I couldn't get it the way, which I was expecting.

Many Thanks!

Upvotes: 6

Views: 4753

Answers (2)

unutbu
unutbu

Reputation: 879411

pd.lreshape can reformat wide data to long format:

In [55]: pd.lreshape(df, {'V':['v1', 'v2']})
Out[57]: 
   x   y   V
0  1  10   3
1  2  20   2
2  3  30   3
3  1  10  13
4  2  20  25
5  3  30  31

lreshape is an undocumented "experimental" feature. To learn more about lreshape see help(pd.lreshape).


If you need reversible operations, use jezrael's pd.melt solution to go from wide to long format, and use pivot_table to go from long to wide format:

In [72]: melted = pd.melt(df, id_vars=['x', 'y'], value_name='V'); melted
Out[72]: 
   x   y variable   V
0  1  10       v1   3
1  2  20       v1   2
2  3  30       v1   3
3  1  10       v2  13
4  2  20       v2  25
5  3  30       v2  31

In [74]: df2 = melted.pivot_table(index=['x','y'], columns=['variable'], values='V').reset_index(); df2
Out[74]: 
variable  x   y  v1  v2
0         1  10   3  13
1         2  20   2  25
2         3  30   3  31

Notice that you must hang on to the variable column if you wish to return to df2. Also keep in mind that it is more efficient to simply retain a reference to df than to recompute it using melted and pivot_table.

Upvotes: 5

jezrael
jezrael

Reputation: 862591

You can use stack with set_index. Last drop column level_2:

print (df.set_index(['x','y']).stack().reset_index(name='V').drop('level_2', axis=1))
   x   y   V
0  1  10   3
1  1  10  13
2  2  20   2
3  2  20  25
4  3  30   3
5  3  30  31

Another solution with melt and sort_values:

print (pd.melt(df, id_vars=['x','y'], value_name='V')
         .drop('variable', axis=1)
         .sort_values('x'))

   x   y   V
0  1  10   3
3  1  10  13
1  2  20   2
4  2  20  25
2  3  30   3
5  3  30  31

Upvotes: 3

Related Questions