BBQuercus
BBQuercus

Reputation: 879

Pandas - Split columns into rows while keeping indices

I have the following simplified DataFrame:

import pandas as pd

pd.DataFrame([{'index_a':'a1', 'index_b':'b1', 'value_x':'x1', 'value_y':'y1'},
              {'index_a':'a2', 'index_b':'b2', 'value_x':'x2', 'value_y':'y2'},
              {'index_a':'a3', 'index_b':'b3', 'value_x':'x3', 'value_y':'y3'}])

It contains two indices and two value columns. For downstream usage, it does not make sense to have two value columns (they are from the same distribution). I therefore want to 'explode' these columns and make one large list. This is what should result:

pd.DataFrame([{'index_a':'a1', 'index_b':'b1', 'value':'x1'},
              {'index_a':'a1', 'index_b':'b1', 'value':'x1'},
              {'index_a':'a2', 'index_b':'b2', 'value':'x2'},
              {'index_a':'a2', 'index_b':'b2', 'value':'y2'},
              {'index_a':'a3', 'index_b':'b3', 'value':'x3'},
              {'index_a':'a3', 'index_b':'b3', 'value':'y3'}])

I tried isolating values via .value and .ravel() but none yielded the desired results.

Thanks in advance. BBQuercus :)

Upvotes: 1

Views: 151

Answers (2)

user7606883
user7606883

Reputation: 11

If you run the code below
import pandas as pd df = pd.DataFrame([{'index_a':'a1', 'index_b':'b1', 'value_x':'x1', 'value_y':'y1'}, {'index_a':'a2', 'index_b':'b2', 'value_x':'x2', 'value_y':'y2'}, {'index_a':'a3', 'index_b':'b3', 'value_x':'x3', 'value_y':'y3'}])
The result will be

index_a index_b value_x value_y 0 a1 b1 x1 y1 1 a2 b2 x2 y2 2 a3 b3 x3 y3

So here you can select columns ['index_a', 'index_b', 'value_x']

df1 = df[['index_a', 'index_b', 'value_x']]

index_a index_b value_x 0 a1 b1 x1 1 a2 b2 x2 2 a3 b3 x3

Similarly select columns ['index_a', 'index_b', 'value_y']

df2 = df[['index_a', 'index_b', 'value_y']]

index_a index_b value_y 0 a1 b1 y1 1 a2 b2 y2 2 a3 b3 y3

Rename the columns value_x and value_y a common name value

df1 = df1.rename(columns={'value_x' : 'value'}) <br> df2 = df2.rename(columns={'value_y': 'value'})

Now concatenate the both DataFrames df1 and df2 ignoring index value

df3 = pd.concat([df1, df2], ignore_index=True)

index_a index_b value 0 a1 b1 x1 1 a2 b2 x2 2 a3 b3 x3 3 a1 b1 y1 4 a2 b2 y2 5 a3 b3 y3

Sort the rows by grouping columns index_a and index_b

df3.sort_values(['index_a', 'index_b'])

Upvotes: 1

anky
anky

Reputation: 75080

Use str.contains() for column names to find to filter the index columns and pass it under df.melt() as id_vars:

final=df.melt(df.columns[df.columns.str.contains('index')]).drop('variable',1)

  index_a index_b value
0      a1      b1    x1
1      a2      b2    x2
2      a3      b3    x3
3      a1      b1    y1
4      a2      b2    y2
5      a3      b3    y3

Upvotes: 1

Related Questions