Reputation: 879
I have the following simplified DataFrame
:
import pandas as pd
pd.DataFrame([{'index_a':'a1', 'index_b':'b1', 'value_x':'x1', 'value_y':'y1'},
{'index_a':'a2', 'index_b':'b2', 'value_x':'x2', 'value_y':'y2'},
{'index_a':'a3', 'index_b':'b3', 'value_x':'x3', 'value_y':'y3'}])
It contains two indices and two value columns. For downstream usage, it does not make sense to have two value columns (they are from the same distribution). I therefore want to 'explode' these columns and make one large list. This is what should result:
pd.DataFrame([{'index_a':'a1', 'index_b':'b1', 'value':'x1'},
{'index_a':'a1', 'index_b':'b1', 'value':'x1'},
{'index_a':'a2', 'index_b':'b2', 'value':'x2'},
{'index_a':'a2', 'index_b':'b2', 'value':'y2'},
{'index_a':'a3', 'index_b':'b3', 'value':'x3'},
{'index_a':'a3', 'index_b':'b3', 'value':'y3'}])
I tried isolating values via .value
and .ravel()
but none yielded the desired results.
Thanks in advance. BBQuercus :)
Upvotes: 1
Views: 151
Reputation: 11
If you run the code below
import pandas as pd
df = pd.DataFrame([{'index_a':'a1', 'index_b':'b1', 'value_x':'x1', 'value_y':'y1'},
{'index_a':'a2', 'index_b':'b2', 'value_x':'x2', 'value_y':'y2'},
{'index_a':'a3', 'index_b':'b3', 'value_x':'x3', 'value_y':'y3'}])
The result will be
index_a index_b value_x value_y
0 a1 b1 x1 y1
1 a2 b2 x2 y2
2 a3 b3 x3 y3
So here you can select columns ['index_a', 'index_b', 'value_x']
df1 = df[['index_a', 'index_b', 'value_x']]
index_a index_b value_x
0 a1 b1 x1
1 a2 b2 x2
2 a3 b3 x3
Similarly select columns ['index_a', 'index_b', 'value_y']
df2 = df[['index_a', 'index_b', 'value_y']]
index_a index_b value_y
0 a1 b1 y1
1 a2 b2 y2
2 a3 b3 y3
Rename the columns value_x
and value_y
a common name value
df1 = df1.rename(columns={'value_x' : 'value'}) <br>
df2 = df2.rename(columns={'value_y': 'value'})
Now concatenate the both DataFrames df1
and df2
ignoring index value
df3 = pd.concat([df1, df2], ignore_index=True)
index_a index_b value
0 a1 b1 x1
1 a2 b2 x2
2 a3 b3 x3
3 a1 b1 y1
4 a2 b2 y2
5 a3 b3 y3
Sort the rows by grouping columns index_a
and index_b
df3.sort_values(['index_a', 'index_b'])
Upvotes: 1
Reputation: 75080
Use str.contains()
for column names to find to filter the index columns and pass it under df.melt()
as id_vars
:
final=df.melt(df.columns[df.columns.str.contains('index')]).drop('variable',1)
index_a index_b value
0 a1 b1 x1
1 a2 b2 x2
2 a3 b3 x3
3 a1 b1 y1
4 a2 b2 y2
5 a3 b3 y3
Upvotes: 1