How to stack a dataframe based on two columns

Question

I have the following dataframe (test_df), and would like to stack the quote.BTC.* and quote.USD.* columns on top of each other and assign a label to them.

{'time_open': {0: '2021-07-02T00:00:00.000Z',
  1: '2021-07-03T00:00:00.000Z',
  2: '2021-07-04T00:00:00.000Z'},
 'time_close': {0: '2021-07-02T23:59:59.999Z',
  1: '2021-07-03T23:59:59.999Z',
  2: '2021-07-04T23:59:59.999Z'},
 'quote.BTC.open': {0: 0.000531186983930284,
  1: 0.0005377264271786289,
  2: 0.0005618907288594747},
 'quote.BTC.close': {0: 0.0005381508670811756,
  1: 0.0005631835764711054,
  2: 0.0005886421482917653},
 'quote.USD.open': {0: 17.83307192, 1: 18.22733883, 2: 19.47993593},
 'quote.USD.close': {0: 18.24172609, 1: 19.52475708, 2: 20.77187449}}

The output should look like this:

I managed to do it with this code, but it seems extremely clumsy and not very generalized:

df_list = []
for asset in ["BTC", "USD"]:
    base_cols = ['time_open', 'time_close']
    # define list of columns I
    asset_cols = [
    f'quote.{asset}.open',
    f'quote.{asset}.close']
    base_cols.extend(asset_cols)

    # define dict of what col names should be renamed to
    col_dict = {
    f'quote.{asset}.open' : 'open',
    f'quote.{asset}.close' : 'close'}

    df_temp = test_df[base_cols].rename(columns=col_dict)
    df_temp["quote_asset"] = asset 
    df_list.append(df_temp)
print(pd.concat(df_list))

Is there a better way to do this?

Shubham Sharma · Accepted Answer

Simple pandas solution

Set the index to time_open and time_close
Split the columns around delimiter . and convert to MultiIndex by passing an optional argument expand=True
drop the unused level then stack on level=0 to reshape

s = df.set_index(['time_open', 'time_close'])
s.columns = s.columns.str.split('.', expand=True)
s = s.droplevel(0, axis=1).stack(0)

Result

print(s)
                                                           close       open
time_open                time_close                                        
2021-07-02T00:00:00.000Z 2021-07-02T23:59:59.999Z BTC   0.000538   0.000531
                                                  USD  18.241726  17.833072
2021-07-03T00:00:00.000Z 2021-07-03T23:59:59.999Z BTC   0.000563   0.000538
                                                  USD  19.524757  18.227339
2021-07-04T00:00:00.000Z 2021-07-04T23:59:59.999Z BTC   0.000589   0.000562
                                                  USD  20.771874  19.479936

Alternate approach with `pd.wide_to_long`

df.columns = df.columns.str.replace(r'quote\.(.*?)\.(.*)', r'\2_\1')
pd.wide_to_long(df, i=['time_open', 'time_close'], j='quote', 
                stubnames=['open', 'close'], sep='_', suffix='\w+')

Result

                                                              open      close
time_open                time_close               quote                      
2021-07-02T00:00:00.000Z 2021-07-02T23:59:59.999Z BTC     0.000531   0.000538
                                                  USD    17.833072  18.241726
2021-07-03T00:00:00.000Z 2021-07-03T23:59:59.999Z BTC     0.000538   0.000563
                                                  USD    18.227339  19.524757
2021-07-04T00:00:00.000Z 2021-07-04T23:59:59.999Z BTC     0.000562   0.000589
                                                  USD    19.479936  20.771874

How to stack a dataframe based on two columns

Answers (2)

Simple pandas solution

Alternate approach with `pd.wide_to_long`

Related Questions

How to stack a dataframe based on two columns

Answers (2)

Simple pandas solution

Alternate approach with pd.wide_to_long

Related Questions

Alternate approach with `pd.wide_to_long`