Reputation: 125
I have problem with this specific formatting in Pandas/Python. My DataFrame looks like this.Current dataframe
The desired output is like this.
Id Predicted
1_1 0
1_2 0
1_3 0
1_4 0
1_5 0
1_6 0
1_7 0
1_8 0
1_9 0
2_1 0
2_2 0
2_3 0
2_4 0
2_5 0
2_6 0
2_8 0
2_9 0
Where the Id is composed from index plus concatenated column name, and prediction is value predicted for this specific coordinates in DataFrame.
1_1 index 1 column 1, 1_2 index 1, column 2 etc.
I want to write output to csv, but do not know how to iterate through DataFrame to obtain this shape.
Upvotes: 1
Views: 579
Reputation: 139162
First, you can reshape the dataframe with stack
:
In [29]: df = pd.DataFrame(np.random.randn(3,3))
In [30]: df
Out[30]:
0 1 2
0 -1.138655 -1.633784 0.328994
1 -0.952137 1.012359 1.327618
2 -1.318940 1.191259 0.133112
In [31]: df2 = df.stack()
In [32]: df2
Out[32]:
0 0 -1.138655
1 -1.633784
2 0.328994
1 0 -0.952137
1 1.012359
2 1.327618
2 0 -1.318940
1 1.191259
2 0.133112
dtype: float64
This gives you a series with a multi-index (two index level, from the original index and column names). Then, you can reformat this multi-index as follows:
In [33]: df2.index = [str(i) + '_'+ str(j) for i, j in df2.index]
In [34]: df2
Out[34]:
0_0 -1.138655
0_1 -1.633784
0_2 0.328994
1_0 -0.952137
1_1 1.012359
1_2 1.327618
2_0 -1.318940
2_1 1.191259
2_2 0.133112
dtype: float64
Note that I included a _
here, as my example dataframe column names did not yet have this.
Upvotes: 1