fffrost
fffrost

Reputation: 1767

stacking columns from pandas dataframe

Apologies if this has been asked in a different way, I'm not sure how else to word it to get what I want to do.

Say I have some dataframe with two columns of values:

import pandas as pd

data = {'col1':[0.6, 0.8, 0.9, 0.87, 0.69, 0.88],
        'col2':[0.72, 0.91, 0.83, 0.85, 0.96, 0.76]}

df = pd.DataFrame(data)

df
Out[1]: 
   col1  col2
0  0.60  0.72
1  0.80  0.91
2  0.90  0.83
3  0.87  0.85
4  0.69  0.96
5  0.88  0.76

I want to take column 2 and 'append' it underneath column 1, continuing the index from 6 to 11. I also would like a new 'identifier' column to be created to have the column name to which each datapoint belongs. The closest I can get to this without lots of spaghetti code is the following:

pd.DataFrame(df.stack()).reset_index()

Out[34]: 
    level_0 level_1     0
0         0    col1  0.60
1         0    col2  0.72
2         1    col1  0.80
3         1    col2  0.91
4         2    col1  0.90
5         2    col2  0.83
6         3    col1  0.87
7         3    col2  0.85
8         4    col1  0.69
9         4    col2  0.96
10        5    col1  0.88
11        5    col2  0.76

I don't want this to be labeled 'level_0', 'level_1', etc. as I have column names in mind that I would like to use.

I'm sure there is an easy and professional way to accomplish this but I don't know enough about it.

Upvotes: 1

Views: 4301

Answers (1)

jezrael
jezrael

Reputation: 862681

You are really close, not necessary call DataFrame constructor.

Also for set columns names is possible rename levels of MultiIndex by rename_axis:

df = df.stack().rename_axis(('a', 'b')).reset_index(name='val')
print (df)
    a     b   val
0   0  col1  0.60
1   0  col2  0.72
2   1  col1  0.80
3   1  col2  0.91
4   2  col1  0.90
5   2  col2  0.83
6   3  col1  0.87
7   3  col2  0.85
8   4  col1  0.69
9   4  col2  0.96
10  5  col1  0.88
11  5  col2  0.76

Numpy alternative:

a = np.repeat(df.index, len(df.columns))
b = np.tile(df.columns, len(df.index))
c = df.values.ravel()

df = pd.DataFrame(np.column_stack([a,b,c]), columns=['a','b','c'])
print (df)
    a     b     c
0   0  col1   0.6
1   0  col2  0.72
2   1  col1   0.8
3   1  col2  0.91
4   2  col1   0.9
5   2  col2  0.83
6   3  col1  0.87
7   3  col2  0.85
8   4  col1  0.69
9   4  col2  0.96
10  5  col1  0.88
11  5  col2  0.76

Upvotes: 3

Related Questions