Pandas stacking DataFrame and concatenating name of columns with index

Question

I have problem with this specific formatting in Pandas/Python. My DataFrame looks like this.Current dataframe

The desired output is like this.

Id  Predicted
1_1 0
1_2 0
1_3 0
1_4 0
1_5 0
1_6 0
1_7 0
1_8 0
1_9 0
2_1 0
2_2 0
2_3 0
2_4 0
2_5 0
2_6 0
2_8 0
2_9 0

Where the Id is composed from index plus concatenated column name, and prediction is value predicted for this specific coordinates in DataFrame.

1_1 index 1 column 1, 1_2 index 1, column 2 etc.

I want to write output to csv, but do not know how to iterate through DataFrame to obtain this shape.

joris · Accepted Answer

First, you can reshape the dataframe with stack:

In [29]: df = pd.DataFrame(np.random.randn(3,3))

In [30]: df
Out[30]:
          0         1         2
0 -1.138655 -1.633784  0.328994
1 -0.952137  1.012359  1.327618
2 -1.318940  1.191259  0.133112

In [31]: df2 = df.stack()

In [32]: df2 
Out[32]:
0  0   -1.138655
   1   -1.633784
   2    0.328994
1  0   -0.952137
   1    1.012359
   2    1.327618
2  0   -1.318940
   1    1.191259
   2    0.133112
dtype: float64

This gives you a series with a multi-index (two index level, from the original index and column names). Then, you can reformat this multi-index as follows:

In [33]: df2.index = [str(i) + '_'+ str(j) for i, j in df2.index]

In [34]: df2
Out[34]:
0_0   -1.138655
0_1   -1.633784
0_2    0.328994
1_0   -0.952137
1_1    1.012359
1_2    1.327618
2_0   -1.318940
2_1    1.191259
2_2    0.133112
dtype: float64

Note that I included a _ here, as my example dataframe column names did not yet have this.

Pandas stacking DataFrame and concatenating name of columns with index

Answers (1)

Related Questions