Reputation: 3416
I have a bunch of minute data in a single df, that I wan't to split into individual days so I can trade each day by itself, but I'm struggling to access the columns properly:
Reading from sql into a pandas df:
df = pd.read_sql_query("SELECT * from ohlc_minutes", conn)
print(df)
I'm then making my timestamp column a datetime:
df['timestamp'] = pd.to_datetime(df['timestamp'])
groupby to split each block of minutes data into different days:
seperate_days = df.groupby(['symbol', df['timestamp'].dt.date])
But when I go to loop over:
for ohlc in seperate_days:
display(ohlc)
print(ohlc['symbol'])
I get:
timestamp open high low close volume trade_count vwap symbol
0 2021-10-13 08:00:00+00:00 140.20 140.40 140.000 140.40 6084 65 140.205417 AAPL
1 2021-10-13 08:01:00+00:00 140.35 140.40 140.200 140.40 3052 58 140.308182 AAPL
2 2021-10-13 08:02:00+00:00 140.35 140.35 140.350 140.35 632 30 140.320934 AAPL
3 2021-10-13 08:03:00+00:00 140.28 140.30 140.200 140.20 2867 36 140.279473 AAPL
4 2021-10-13 08:04:00+00:00 140.20 140.20 140.200 140.20 435 36 140.199195 AAPL
... ... ... ... ... ... ... ... ... ...
58250 2021-10-27 19:58:00+00:00 209.31 209.33 209.215 209.26 26440 348 209.251852 ZTS
58251 2021-10-27 19:59:00+00:00 209.28 209.59 209.010 209.56 109758 1060 209.384672 ZTS
58252 2021-10-27 20:03:00+00:00 209.58 209.58 209.580 209.58 537786 49 209.580000 ZTS
58253 2021-10-27 20:05:00+00:00 209.58 209.58 209.580 209.58 4170 1 209.580000 ZTS
58254 2021-10-27 20:12:00+00:00 209.58 209.58 209.580 209.58 144 1 209.580000 ZTS
[58255 rows x 9 columns]
(('AAPL', datetime.date(2021, 10, 13)), timestamp open high low close volume trade_count vwap symbol
0 2021-10-13 08:00:00+00:00 140.20 140.40 140.00 140.40 6084 65 140.205417 AAPL
1 2021-10-13 08:01:00+00:00 140.35 140.40 140.20 140.40 3052 58 140.308182 AAPL
2 2021-10-13 08:02:00+00:00 140.35 140.35 140.35 140.35 632 30 140.320934 AAPL
3 2021-10-13 08:03:00+00:00 140.28 140.30 140.20 140.20 2867 36 140.279473 AAPL
4 2021-10-13 08:04:00+00:00 140.20 140.20 140.20 140.20 435 36 140.199195 AAPL
.. ... ... ... ... ... ... ... ... ...
819 2021-10-13 23:55:00+00:00 141.18 141.18 141.18 141.18 577 7 141.179428 AAPL
820 2021-10-13 23:56:00+00:00 141.20 141.20 141.20 141.20 623 13 141.198684 AAPL
821 2021-10-13 23:57:00+00:00 141.22 141.23 141.22 141.23 1635 21 141.219070 AAPL
822 2021-10-13 23:58:00+00:00 141.22 141.24 141.20 141.24 1575 14 141.227410 AAPL
823 2021-10-13 23:59:00+00:00 141.23 141.24 141.22 141.24 2541 26 141.233625 AAPL
[824 rows x 9 columns])
Traceback (most recent call last):
File "/home/dan/Documents/code/wolfhound/vectorbt_minutes.py", line 48, in <module>
print(ohlc['symbol'])
TypeError: tuple indices must be integers or slices, not str
Any ideas what I'm doing wrong? I just want to be able to access the columns.
Upvotes: 1
Views: 51
Reputation: 30002
DataFrame.groupby
returns as list of tuple, you can do
for name, ohlc in seperate_days:
display(ohlc)
print(ohlc['symbol'])
Upvotes: 2