Reputation: 2552
Say i have a dataframe of stock prices with multi index, that i want to normalize into one dataframe, but with an extra column indicating which stock it belongs to. eg:
import pandas as pd
from pandas import Timestamp
tickers = ['TSLA','AAPL','MSFT']
# data below tuncated for example
new_data = pd.DataFrame({('AAPL',
'Open'): {Timestamp('2021-01-15 13:40:00-0500', tz='America/New_York'): 128.7292938232422, Timestamp('2021-01-15 15:05:00-0500', tz='America/New_York'): 127.69000244140625, Timestamp('2021-01-15 15:10:00-0500', tz='America/New_York'): 127.54000091552734, Timestamp('2021-01-15 13:05:00-0500', tz='America/New_York'): 128.31570434570312, Timestamp('2021-01-15 11:20:00-0500', tz='America/New_York'): 128.7292938232422},
('AAPL',
'High'): {Timestamp('2021-01-15 13:40:00-0500', tz='America/New_York'): 128.74000549316406, Timestamp('2021-01-15 15:05:00-0500', tz='America/New_York'): 127.69999694824219, Timestamp('2021-01-15 15:10:00-0500', tz='America/New_York'): 127.845703125, Timestamp('2021-01-15 13:05:00-0500', tz='America/New_York'): 128.4199981689453, Timestamp('2021-01-15 11:20:00-0500', tz='America/New_York'): 128.92999267578125},
('AAPL',
'Low'): {Timestamp('2021-01-15 13:40:00-0500', tz='America/New_York'): 128.6199951171875, Timestamp('2021-01-15 15:05:00-0500', tz='America/New_York'): 127.48999786376953, Timestamp('2021-01-15 15:10:00-0500', tz='America/New_York'): 127.52999877929688, Timestamp('2021-01-15 13:05:00-0500', tz='America/New_York'): 128.30999755859375, Timestamp('2021-01-15 11:20:00-0500', tz='America/New_York'): 128.60000610351562},
('AAPL',
'Close'): {Timestamp('2021-01-15 13:40:00-0500', tz='America/New_York'): 128.67999267578125, Timestamp('2021-01-15 15:05:00-0500', tz='America/New_York'): 127.54000091552734, Timestamp('2021-01-15 15:10:00-0500', tz='America/New_York'): 127.80999755859375, Timestamp('2021-01-15 13:05:00-0500', tz='America/New_York'): 128.37989807128906, Timestamp('2021-01-15 11:20:00-0500', tz='America/New_York'): 128.875},
('AAPL',
'Adj Close'): {Timestamp('2021-01-15 13:40:00-0500', tz='America/New_York'): 128.67999267578125, Timestamp('2021-01-15 15:05:00-0500', tz='America/New_York'): 127.54000091552734, Timestamp('2021-01-15 15:10:00-0500', tz='America/New_York'): 127.80999755859375, Timestamp('2021-01-15 13:05:00-0500', tz='America/New_York'): 128.37989807128906, Timestamp('2021-01-15 11:20:00-0500', tz='America/New_York'): 128.875},
('AAPL',
'Volume'): {Timestamp('2021-01-15 13:40:00-0500', tz='America/New_York'): 354071, Timestamp('2021-01-15 15:05:00-0500', tz='America/New_York'): 1278221, Timestamp('2021-01-15 15:10:00-0500', tz='America/New_York'): 1156859, Timestamp('2021-01-15 13:05:00-0500', tz='America/New_York'): 383722, Timestamp('2021-01-15 11:20:00-0500', tz='America/New_York'): 1283176},
('TSLA',
'Open'): {Timestamp('2021-01-15 13:40:00-0500', tz='America/New_York'): 842.9199829101562, Timestamp('2021-01-15 15:05:00-0500', tz='America/New_York'): 824.4307861328125, Timestamp('2021-01-15 15:10:00-0500', tz='America/New_York'): 820.8300170898438, Timestamp('2021-01-15 13:05:00-0500', tz='America/New_York'): 834.3262939453125, Timestamp('2021-01-15 11:20:00-0500', tz='America/New_York'): 844.3800048828125},
('TSLA',
'High'): {Timestamp('2021-01-15 13:40:00-0500', tz='America/New_York'): 844.5, Timestamp('2021-01-15 15:05:00-0500', tz='America/New_York'): 824.919921875, Timestamp('2021-01-15 15:10:00-0500', tz='America/New_York'): 824.9169921875, Timestamp('2021-01-15 13:05:00-0500', tz='America/New_York'): 835.7000122070312, Timestamp('2021-01-15 11:20:00-0500', tz='America/New_York'): 845.8494262695312},
('TSLA',
'Low'): {Timestamp('2021-01-15 13:40:00-0500', tz='America/New_York'): 842.0, Timestamp('2021-01-15 15:05:00-0500', tz='America/New_York'): 819.0999755859375, Timestamp('2021-01-15 15:10:00-0500', tz='America/New_York'): 820.5632934570312, Timestamp('2021-01-15 13:05:00-0500', tz='America/New_York'): 832.0999755859375, Timestamp('2021-01-15 11:20:00-0500', tz='America/New_York'): 842.1199951171875},
('TSLA',
'Close'): {Timestamp('2021-01-15 13:40:00-0500', tz='America/New_York'): 842.7081298828125, Timestamp('2021-01-15 15:05:00-0500', tz='America/New_York'): 820.8599853515625, Timestamp('2021-01-15 15:10:00-0500', tz='America/New_York'): 823.3134765625, Timestamp('2021-01-15 13:05:00-0500', tz='America/New_York'): 835.1749877929688, Timestamp('2021-01-15 11:20:00-0500', tz='America/New_York'): 843.85009765625},
('TSLA',
'Adj Close'): {Timestamp('2021-01-15 13:40:00-0500', tz='America/New_York'): 842.7081298828125, Timestamp('2021-01-15 15:05:00-0500', tz='America/New_York'): 820.8599853515625, Timestamp('2021-01-15 15:10:00-0500', tz='America/New_York'): 823.3134765625, Timestamp('2021-01-15 13:05:00-0500', tz='America/New_York'): 835.1749877929688, Timestamp('2021-01-15 11:20:00-0500', tz='America/New_York'): 843.85009765625},
('TSLA',
'Volume'): {Timestamp('2021-01-15 13:40:00-0500', tz='America/New_York'): 290117, Timestamp('2021-01-15 15:05:00-0500', tz='America/New_York'): 900962, Timestamp('2021-01-15 15:10:00-0500', tz='America/New_York'): 482207, Timestamp('2021-01-15 13:05:00-0500', tz='America/New_York'): 331725, Timestamp('2021-01-15 11:20:00-0500', tz='America/New_York'): 485683},
('MSFT',
'Open'): {Timestamp('2021-01-15 13:40:00-0500', tz='America/New_York'): 213.8800048828125, Timestamp('2021-01-15 15:05:00-0500', tz='America/New_York'): 213.39999389648438, Timestamp('2021-01-15 15:10:00-0500', tz='America/New_York'): 213.14999389648438, Timestamp('2021-01-15 13:05:00-0500', tz='America/New_York'): 213.07000732421875, Timestamp('2021-01-15 11:20:00-0500', tz='America/New_York'): 213.55999755859375},
('MSFT',
'High'): {Timestamp('2021-01-15 13:40:00-0500', tz='America/New_York'): 213.89500427246094, Timestamp('2021-01-15 15:05:00-0500', tz='America/New_York'): 213.4499969482422, Timestamp('2021-01-15 15:10:00-0500', tz='America/New_York'): 213.74000549316406, Timestamp('2021-01-15 13:05:00-0500', tz='America/New_York'): 213.5500030517578, Timestamp('2021-01-15 11:20:00-0500', tz='America/New_York'): 213.8800048828125},
('MSFT',
'Low'): {Timestamp('2021-01-15 13:40:00-0500', tz='America/New_York'): 213.69009399414062, Timestamp('2021-01-15 15:05:00-0500', tz='America/New_York'): 213.11900329589844, Timestamp('2021-01-15 15:10:00-0500', tz='America/New_York'): 213.14999389648438, Timestamp('2021-01-15 13:05:00-0500', tz='America/New_York'): 213.0399932861328, Timestamp('2021-01-15 11:20:00-0500', tz='America/New_York'): 213.5},
('MSFT',
'Close'): {Timestamp('2021-01-15 13:40:00-0500', tz='America/New_York'): 213.77999877929688, Timestamp('2021-01-15 15:05:00-0500', tz='America/New_York'): 213.1699981689453, Timestamp('2021-01-15 15:10:00-0500', tz='America/New_York'): 213.69000244140625, Timestamp('2021-01-15 13:05:00-0500', tz='America/New_York'): 213.49000549316406, Timestamp('2021-01-15 11:20:00-0500', tz='America/New_York'): 213.80499267578125},
('MSFT',
'Adj Close'): {Timestamp('2021-01-15 13:40:00-0500', tz='America/New_York'): 213.77999877929688, Timestamp('2021-01-15 15:05:00-0500', tz='America/New_York'): 213.1699981689453, Timestamp('2021-01-15 15:10:00-0500', tz='America/New_York'): 213.69000244140625, Timestamp('2021-01-15 13:05:00-0500', tz='America/New_York'): 213.49000549316406, Timestamp('2021-01-15 11:20:00-0500', tz='America/New_York'): 213.80499267578125},
('MSFT',
'Volume'): {Timestamp('2021-01-15 13:40:00-0500', tz='America/New_York'): 80589, Timestamp('2021-01-15 15:05:00-0500', tz='America/New_York'): 147625, Timestamp('2021-01-15 15:10:00-0500', tz='America/New_York'): 294613, Timestamp('2021-01-15 13:05:00-0500', tz='America/New_York'): 151667, Timestamp('2021-01-15 11:20:00-0500', tz='America/New_York'): 284443}})
In [10]: new_data.columns
Out[10]:
MultiIndex([('AAPL', 'Open'),
('AAPL', 'High'),
('AAPL', 'Low'),
('AAPL', 'Close'),
('AAPL', 'Adj Close'),
('AAPL', 'Volume'),
('TSLA', 'Open'),
('TSLA', 'High'),
('TSLA', 'Low'),
('TSLA', 'Close'),
('TSLA', 'Adj Close'),
('TSLA', 'Volume'),
('MSFT', 'Open'),
('MSFT', 'High'),
('MSFT', 'Low'),
('MSFT', 'Close'),
('MSFT', 'Adj Close'),
('MSFT', 'Volume')],
)
Intended Result
Code for this example below.
combined = pd.read_json('{"Datetime":{"0":1610726400000,"1":1610726700000,"2":1610727000000,"3":1610727300000,"4":1610727600000},"Open":{"0":842.3516845703,"1":837.1900024414,"2":838.5949707031,"3":840.8345947266,"4":844.3800048828},"High":{"0":843.8900146484,"1":839.7600097656,"2":841.7399902344,"3":844.8800048828,"4":845.8494262695},"Low":{"0":836.5900268555,"1":835.0,"2":838.2700195312,"3":839.3001098633,"4":842.1199951172},"Close":{"0":837.1259765625,"1":838.5020141602,"2":841.0,"3":844.25,"4":843.8500976562},"Adj Close":{"0":837.1259765625,"1":838.5020141602,"2":841.0,"3":844.25,"4":843.8500976562},"Volume":{"0":0,"1":636383,"2":401164,"3":514418,"4":485683},"ticker":{"0":"TSLA","1":"TSLA","2":"TSLA","3":"TSLA","4":"TSLA"}}')
Current Method
Currently I'm manually looping through another list tickers
above, using .get()
to get the df with only that tickers columns, then appending to a list and re-concatenating everything as below
temp = []
for ticker in tickers:
stock_df = new_data.get(ticker)
if stock_df is not None:
stock_df['ticker'] = ticker
temp.append(stock_df.reset_index())
combined = pd.concat(temp, ignore_index=True)
Question - How to Improve?
This is incredibly inefficient, and i know there should be some magic pandas command that can do this at one go without loops. reset_index()
has not worked for me..
Thank you!
Upvotes: 0
Views: 85
Reputation: 71687
You can use .stack
on level=0
to reshape the dataframe, then .sort
and rename
the index:
new_data[tickers].stack(level=0).sort_index(level=[1, 0])\
.rename_axis(['Datetime', 'Ticker']).reset_index()
Datetime Ticker Adj Close Close High Low Open Volume
0 2021-01-15 11:20:00-05:00 AAPL 128.875000 128.875000 128.929993 128.600006 128.729294 1283176
1 2021-01-15 13:05:00-05:00 AAPL 128.379898 128.379898 128.419998 128.309998 128.315704 383722
2 2021-01-15 13:40:00-05:00 AAPL 128.679993 128.679993 128.740005 128.619995 128.729294 354071
3 2021-01-15 15:05:00-05:00 AAPL 127.540001 127.540001 127.699997 127.489998 127.690002 1278221
4 2021-01-15 15:10:00-05:00 AAPL 127.809998 127.809998 127.845703 127.529999 127.540001 1156859
5 2021-01-15 11:20:00-05:00 MSFT 213.804993 213.804993 213.880005 213.500000 213.559998 284443
6 2021-01-15 13:05:00-05:00 MSFT 213.490005 213.490005 213.550003 213.039993 213.070007 151667
7 2021-01-15 13:40:00-05:00 MSFT 213.779999 213.779999 213.895004 213.690094 213.880005 80589
8 2021-01-15 15:05:00-05:00 MSFT 213.169998 213.169998 213.449997 213.119003 213.399994 147625
9 2021-01-15 15:10:00-05:00 MSFT 213.690002 213.690002 213.740005 213.149994 213.149994 294613
10 2021-01-15 11:20:00-05:00 TSLA 843.850098 843.850098 845.849426 842.119995 844.380005 485683
11 2021-01-15 13:05:00-05:00 TSLA 835.174988 835.174988 835.700012 832.099976 834.326294 331725
12 2021-01-15 13:40:00-05:00 TSLA 842.708130 842.708130 844.500000 842.000000 842.919983 290117
13 2021-01-15 15:05:00-05:00 TSLA 820.859985 820.859985 824.919922 819.099976 824.430786 900962
14 2021-01-15 15:10:00-05:00 TSLA 823.313477 823.313477 824.916992 820.563293 820.830017 482207
Upvotes: 1