user6400946
user6400946

Reputation: 116

How to stack a dataframe based on two columns

I have the following dataframe (test_df), and would like to stack the quote.BTC.* and quote.USD.* columns on top of each other and assign a label to them.

{'time_open': {0: '2021-07-02T00:00:00.000Z',
  1: '2021-07-03T00:00:00.000Z',
  2: '2021-07-04T00:00:00.000Z'},
 'time_close': {0: '2021-07-02T23:59:59.999Z',
  1: '2021-07-03T23:59:59.999Z',
  2: '2021-07-04T23:59:59.999Z'},
 'quote.BTC.open': {0: 0.000531186983930284,
  1: 0.0005377264271786289,
  2: 0.0005618907288594747},
 'quote.BTC.close': {0: 0.0005381508670811756,
  1: 0.0005631835764711054,
  2: 0.0005886421482917653},
 'quote.USD.open': {0: 17.83307192, 1: 18.22733883, 2: 19.47993593},
 'quote.USD.close': {0: 18.24172609, 1: 19.52475708, 2: 20.77187449}}

The output should look like this: example_output

I managed to do it with this code, but it seems extremely clumsy and not very generalized:

df_list = []
for asset in ["BTC", "USD"]:
    base_cols = ['time_open', 'time_close']
    # define list of columns I
    asset_cols = [
    f'quote.{asset}.open',
    f'quote.{asset}.close']
    base_cols.extend(asset_cols)

    # define dict of what col names should be renamed to
    col_dict = {
    f'quote.{asset}.open' : 'open',
    f'quote.{asset}.close' : 'close'}

    df_temp = test_df[base_cols].rename(columns=col_dict)
    df_temp["quote_asset"] = asset 
    df_list.append(df_temp)
print(pd.concat(df_list))

Is there a better way to do this?

Upvotes: 3

Views: 73

Answers (2)

sammywemmy
sammywemmy

Reputation: 28729

Another option, in addition to the excellent solutions by @ShubhamSharma, is pivot_longer from pyjanitor:

# pip install pyjanitor
import janitor
import pandas as pd
df.pivot_longer(index=['time_open', 'time_close'],
                names_to=("quote_asset", ".value"), 
                names_pattern=r".+\.(.+)\.(.+)")
 
                  time_open                time_close quote_asset       open      close
0  2021-07-02T00:00:00.000Z  2021-07-02T23:59:59.999Z         BTC   0.000531   0.000538
1  2021-07-03T00:00:00.000Z  2021-07-03T23:59:59.999Z         BTC   0.000538   0.000563
2  2021-07-04T00:00:00.000Z  2021-07-04T23:59:59.999Z         BTC   0.000562   0.000589
3  2021-07-02T00:00:00.000Z  2021-07-02T23:59:59.999Z         USD  17.833072  18.241726
4  2021-07-03T00:00:00.000Z  2021-07-03T23:59:59.999Z         USD  18.227339  19.524757
5  2021-07-04T00:00:00.000Z  2021-07-04T23:59:59.999Z         USD  19.479936  20.771874

The idea here is that any part of the columns associated with .value stays as a column, while the rest are collated into another column. These are determined by the groups in the regex pattern in names_pattern.

Upvotes: 2

Shubham Sharma
Shubham Sharma

Reputation: 71707

Simple pandas solution

  • Set the index to time_open and time_close
  • Split the columns around delimiter . and convert to MultiIndex by passing an optional argument expand=True
  • drop the unused level then stack on level=0 to reshape
s = df.set_index(['time_open', 'time_close'])
s.columns = s.columns.str.split('.', expand=True)
s = s.droplevel(0, axis=1).stack(0)

Result

print(s)
                                                           close       open
time_open                time_close                                        
2021-07-02T00:00:00.000Z 2021-07-02T23:59:59.999Z BTC   0.000538   0.000531
                                                  USD  18.241726  17.833072
2021-07-03T00:00:00.000Z 2021-07-03T23:59:59.999Z BTC   0.000563   0.000538
                                                  USD  19.524757  18.227339
2021-07-04T00:00:00.000Z 2021-07-04T23:59:59.999Z BTC   0.000589   0.000562
                                                  USD  20.771874  19.479936

Alternate approach with pd.wide_to_long

df.columns = df.columns.str.replace(r'quote\.(.*?)\.(.*)', r'\2_\1')
pd.wide_to_long(df, i=['time_open', 'time_close'], j='quote', 
                stubnames=['open', 'close'], sep='_', suffix='\w+')

Result

                                                              open      close
time_open                time_close               quote                      
2021-07-02T00:00:00.000Z 2021-07-02T23:59:59.999Z BTC     0.000531   0.000538
                                                  USD    17.833072  18.241726
2021-07-03T00:00:00.000Z 2021-07-03T23:59:59.999Z BTC     0.000538   0.000563
                                                  USD    18.227339  19.524757
2021-07-04T00:00:00.000Z 2021-07-04T23:59:59.999Z BTC     0.000562   0.000589
                                                  USD    19.479936  20.771874

Upvotes: 3

Related Questions