Reputation: 577
I have the following Pandas DataFrame with 50 columns, it consists of the daily closing spot price and option price of a few selected stocks, 25 stocks in total but I am only showing 3 here. The prices shown here are just an example:
date tsla_spot tsla_options aapl_spot aapl_options msft_spot msft_options
2020-01-01 350 23.02 257.21 3.45 170.32 3.56
2020-01-02 345.64 21.32 260.10 3.79 123.45 43.21
2020-01-03 345.12 20.43 262.12 3.90 123.54 45.32
I want to have the following pandas dataframe, but can't figure out what how... Instead, is it known as pivot?
date stock spot options
2020-01-01 tsla 350 23.02
2020-01-01 aapl 257.21 3.79
2020-01-01 msft 170.32 3.56
2020-01-02 tsla 345.64 21.32
2020-01-02 aapl 260.10 3.79
2020-01-02 msft 123.45 43.21
Thank you in adv!
Upvotes: 2
Views: 129
Reputation: 28644
You can use pd.wide_to_long
, by first reorganizing the columns:
pattern = r"(?P<first>\w+)_(?P<last>\w+)"
repl = lambda m: f"{m.group('last')}_{m.group('first')}"
df.columns = df.columns.str.replace(pattern, repl)
df.columns
Index(['date', 'spot_tsla', 'options_tsla', 'spot_aapl', 'options_aapl',
'spot_msft', 'options_msft'],
dtype='object')
Now, apply the wide_to_long function:
pd.wide_to_long(df,
stubnames=["spot", "options"],
i="date",
j="stock",
sep="_",
suffix=".+")
spot options
date stock
2020-01-01 tsla 350.00 23.02
2020-01-02 tsla 345.64 21.32
2020-01-03 tsla 345.12 20.43
2020-01-01 aapl 257.21 3.45
2020-01-02 aapl 260.10 3.79
2020-01-03 aapl 262.12 3.90
2020-01-01 msft 170.32 3.56
2020-01-02 msft 123.45 43.21
2020-01-03 msft 123.54 45.32
Another option is to use the pivot_longer function from pyjanitor :
# pip install pyjanitor
import janitor
df.pivot_longer(index="date",
names_to=("stock", ".value"),
names_sep="_")
date stock spot options
0 2020-01-01 tsla 350.00 23.02
1 2020-01-01 aapl 257.21 3.45
2 2020-01-01 msft 170.32 3.56
3 2020-01-02 tsla 345.64 21.32
4 2020-01-02 aapl 260.10 3.79
5 2020-01-02 msft 123.45 43.21
6 2020-01-03 tsla 345.12 20.43
7 2020-01-03 aapl 262.12 3.90
8 2020-01-03 msft 123.54 45.32
The .value
tells the function to make spot
and options
as new column names, and the rest become values in the stock
column.
Upvotes: 2
Reputation: 862511
Convert columns without separator to index, split columns names for MultiIndex
and reshape by DataFrame.stack
with DataFrame.rename_axis
for new columns names:
df = df.set_index('date')
df.columns = df.columns.str.split('_', expand=True)
df = df.stack(0).rename_axis(['date', 'stock']).reset_index()
print (df)
date stock options spot
0 2020-01-01 aapl 3.45 257.21
1 2020-01-01 msft 3.56 170.32
2 2020-01-01 tsla 23.02 350.00
3 2020-01-02 aapl 3.79 260.10
4 2020-01-02 msft 43.21 123.45
5 2020-01-02 tsla 21.32 345.64
6 2020-01-03 aapl 3.90 262.12
7 2020-01-03 msft 45.32 123.54
8 2020-01-03 tsla 20.43 345.12
Upvotes: 1