a7dc
a7dc

Reputation: 3416

How to access columns when using groupby?

I have a bunch of minute data in a single df, that I wan't to split into individual days so I can trade each day by itself, but I'm struggling to access the columns properly:

Reading from sql into a pandas df:

df = pd.read_sql_query("SELECT * from ohlc_minutes", conn)

print(df)

I'm then making my timestamp column a datetime:

df['timestamp'] = pd.to_datetime(df['timestamp'])

groupby to split each block of minutes data into different days:

seperate_days = df.groupby(['symbol', df['timestamp'].dt.date])

But when I go to loop over:

for ohlc in seperate_days:
    display(ohlc)
    print(ohlc['symbol'])

I get:

    timestamp    open    high      low   close  volume  trade_count        vwap symbol
0      2021-10-13 08:00:00+00:00  140.20  140.40  140.000  140.40    6084           65  140.205417   AAPL
1      2021-10-13 08:01:00+00:00  140.35  140.40  140.200  140.40    3052           58  140.308182   AAPL
2      2021-10-13 08:02:00+00:00  140.35  140.35  140.350  140.35     632           30  140.320934   AAPL
3      2021-10-13 08:03:00+00:00  140.28  140.30  140.200  140.20    2867           36  140.279473   AAPL
4      2021-10-13 08:04:00+00:00  140.20  140.20  140.200  140.20     435           36  140.199195   AAPL
...                          ...     ...     ...      ...     ...     ...          ...         ...    ...
58250  2021-10-27 19:58:00+00:00  209.31  209.33  209.215  209.26   26440          348  209.251852    ZTS
58251  2021-10-27 19:59:00+00:00  209.28  209.59  209.010  209.56  109758         1060  209.384672    ZTS
58252  2021-10-27 20:03:00+00:00  209.58  209.58  209.580  209.58  537786           49  209.580000    ZTS
58253  2021-10-27 20:05:00+00:00  209.58  209.58  209.580  209.58    4170            1  209.580000    ZTS
58254  2021-10-27 20:12:00+00:00  209.58  209.58  209.580  209.58     144            1  209.580000    ZTS

[58255 rows x 9 columns]
(('AAPL', datetime.date(2021, 10, 13)),                     timestamp    open    high     low   close  volume  trade_count        vwap symbol
0   2021-10-13 08:00:00+00:00  140.20  140.40  140.00  140.40    6084           65  140.205417   AAPL
1   2021-10-13 08:01:00+00:00  140.35  140.40  140.20  140.40    3052           58  140.308182   AAPL
2   2021-10-13 08:02:00+00:00  140.35  140.35  140.35  140.35     632           30  140.320934   AAPL
3   2021-10-13 08:03:00+00:00  140.28  140.30  140.20  140.20    2867           36  140.279473   AAPL
4   2021-10-13 08:04:00+00:00  140.20  140.20  140.20  140.20     435           36  140.199195   AAPL
..                        ...     ...     ...     ...     ...     ...          ...         ...    ...
819 2021-10-13 23:55:00+00:00  141.18  141.18  141.18  141.18     577            7  141.179428   AAPL
820 2021-10-13 23:56:00+00:00  141.20  141.20  141.20  141.20     623           13  141.198684   AAPL
821 2021-10-13 23:57:00+00:00  141.22  141.23  141.22  141.23    1635           21  141.219070   AAPL
822 2021-10-13 23:58:00+00:00  141.22  141.24  141.20  141.24    1575           14  141.227410   AAPL
823 2021-10-13 23:59:00+00:00  141.23  141.24  141.22  141.24    2541           26  141.233625   AAPL

[824 rows x 9 columns])
Traceback (most recent call last):
  File "/home/dan/Documents/code/wolfhound/vectorbt_minutes.py", line 48, in <module>
    print(ohlc['symbol'])
TypeError: tuple indices must be integers or slices, not str

Any ideas what I'm doing wrong? I just want to be able to access the columns.

Upvotes: 1

Views: 51

Answers (1)

Ynjxsjmh
Ynjxsjmh

Reputation: 30002

DataFrame.groupby returns as list of tuple, you can do

for name, ohlc in seperate_days:
    display(ohlc)
    print(ohlc['symbol'])

Upvotes: 2

Related Questions