Reputation: 116
I have the following dataframe (test_df), and would like to stack the quote.BTC.* and quote.USD.* columns on top of each other and assign a label to them.
{'time_open': {0: '2021-07-02T00:00:00.000Z',
1: '2021-07-03T00:00:00.000Z',
2: '2021-07-04T00:00:00.000Z'},
'time_close': {0: '2021-07-02T23:59:59.999Z',
1: '2021-07-03T23:59:59.999Z',
2: '2021-07-04T23:59:59.999Z'},
'quote.BTC.open': {0: 0.000531186983930284,
1: 0.0005377264271786289,
2: 0.0005618907288594747},
'quote.BTC.close': {0: 0.0005381508670811756,
1: 0.0005631835764711054,
2: 0.0005886421482917653},
'quote.USD.open': {0: 17.83307192, 1: 18.22733883, 2: 19.47993593},
'quote.USD.close': {0: 18.24172609, 1: 19.52475708, 2: 20.77187449}}
The output should look like this:
I managed to do it with this code, but it seems extremely clumsy and not very generalized:
df_list = []
for asset in ["BTC", "USD"]:
base_cols = ['time_open', 'time_close']
# define list of columns I
asset_cols = [
f'quote.{asset}.open',
f'quote.{asset}.close']
base_cols.extend(asset_cols)
# define dict of what col names should be renamed to
col_dict = {
f'quote.{asset}.open' : 'open',
f'quote.{asset}.close' : 'close'}
df_temp = test_df[base_cols].rename(columns=col_dict)
df_temp["quote_asset"] = asset
df_list.append(df_temp)
print(pd.concat(df_list))
Is there a better way to do this?
Upvotes: 3
Views: 73
Reputation: 28729
Another option, in addition to the excellent solutions by @ShubhamSharma, is pivot_longer from pyjanitor:
# pip install pyjanitor
import janitor
import pandas as pd
df.pivot_longer(index=['time_open', 'time_close'],
names_to=("quote_asset", ".value"),
names_pattern=r".+\.(.+)\.(.+)")
time_open time_close quote_asset open close
0 2021-07-02T00:00:00.000Z 2021-07-02T23:59:59.999Z BTC 0.000531 0.000538
1 2021-07-03T00:00:00.000Z 2021-07-03T23:59:59.999Z BTC 0.000538 0.000563
2 2021-07-04T00:00:00.000Z 2021-07-04T23:59:59.999Z BTC 0.000562 0.000589
3 2021-07-02T00:00:00.000Z 2021-07-02T23:59:59.999Z USD 17.833072 18.241726
4 2021-07-03T00:00:00.000Z 2021-07-03T23:59:59.999Z USD 18.227339 19.524757
5 2021-07-04T00:00:00.000Z 2021-07-04T23:59:59.999Z USD 19.479936 20.771874
The idea here is that any part of the columns associated with .value
stays as a column, while the rest are collated into another column. These are determined by the groups in the regex pattern in names_pattern
.
Upvotes: 2
Reputation: 71707
time_open
and time_close
.
and convert to MultiIndex by passing an optional argument expand=True
drop
the unused level then stack
on level=0
to reshapes = df.set_index(['time_open', 'time_close'])
s.columns = s.columns.str.split('.', expand=True)
s = s.droplevel(0, axis=1).stack(0)
Result
print(s)
close open
time_open time_close
2021-07-02T00:00:00.000Z 2021-07-02T23:59:59.999Z BTC 0.000538 0.000531
USD 18.241726 17.833072
2021-07-03T00:00:00.000Z 2021-07-03T23:59:59.999Z BTC 0.000563 0.000538
USD 19.524757 18.227339
2021-07-04T00:00:00.000Z 2021-07-04T23:59:59.999Z BTC 0.000589 0.000562
USD 20.771874 19.479936
pd.wide_to_long
df.columns = df.columns.str.replace(r'quote\.(.*?)\.(.*)', r'\2_\1')
pd.wide_to_long(df, i=['time_open', 'time_close'], j='quote',
stubnames=['open', 'close'], sep='_', suffix='\w+')
Result
open close
time_open time_close quote
2021-07-02T00:00:00.000Z 2021-07-02T23:59:59.999Z BTC 0.000531 0.000538
USD 17.833072 18.241726
2021-07-03T00:00:00.000Z 2021-07-03T23:59:59.999Z BTC 0.000538 0.000563
USD 18.227339 19.524757
2021-07-04T00:00:00.000Z 2021-07-04T23:59:59.999Z BTC 0.000562 0.000589
USD 19.479936 20.771874
Upvotes: 3