Reputation: 1
I am successfully requested multiple stocks real-time data from realtimeBar from ibapi library. Like this code:
def stream_data(self, reqId, time, open, high, low, close, volume, wap, count):
print(reqId, time, open, high, low, close, volume, wap, count)
Output looks like this:
1 1647459265 158.06 158.09 158.04 158.08 175 158.06 82
0 1647459265 832.66 832.86 832.44 832.66 4 832.64 4
2 1647459265 14.89 14.9 14.89 14.9 7 14.896 5
1 1647459270 158.08 158.14 158.08 158.13 76 158.11 43
2 1647459270 14.91 14.9237 14.91 14.91 65 14.9179 46
0 1647459270 832.98 833.7 832.98 833.36 36 833.48 12
And it goes forever until I stopped the script. I have a problem to insert/adapt this print in DataFrame from pandas. How should I do this? Could you show me a snippet, please? I was trying different options but different errors occurred, that's why I don't show my error type here. I want to achieve table like this:
0 1647459265 832.66 832.86 832.44 832.66 4 832.64 4 1 1647459265 158.06 158.09 158.04 158.08 175 158.06 82 2 1647459265 14.89 14.9 14.89 14.9 7 14.896 5
0 1647459270 832.98 833.7 832.98 833.36 36 833.48 12 1 1647459270 158.08 158.14 158.08 158.13 76 158.11 43 2 1647459270 14.91 14.9237 14.91 14.91 65 14.9179 46
The goal of this DataFrame is to sort real time (every 5 sec) streaming data by reqId (because this is different stocks in fact), and make calculations (like average price, percentage change, etc.) on let's say 10 last bar (recognized by unixtime) I received. Should I use MultiIndex? How? Or multiple DataFrames? Or maybe do you have another method, library, solution? Any help I would highly appreciated :-)
Upvotes: 0
Views: 345
Reputation: 173
If I understood correctly, every 5 seconds you get values from reqId, time, open, high, low, close, volume, wap, count
. Then, you want to pass them to a dataframe and sort by reqId
and put all values with the same time
on a single row.
I don't get why you want to combine the values with the same time
on a single row, but I suspect you don't need to do it to achieve what you want.
I think this may help:
def stream_data(self, reqId, time, open, high, low, close, volume, wap, count):
df = pd.DataFrame(
[[reqId, reqId, time, open, high, low, close, volume, wap, count]], colums = ['reqId', 'time', 'open', 'high', 'low', 'close', 'volume', 'wap', 'count'])
df['time'] = pd.to_datetime(df['time'],unit='s')
return df
This will get the stream values into a dataframe and convert the unixtime to a more friendly datetime format.
Then, you can calculate whatever you want for a given frequency. For example, for your example dataframe, we can calculate the mean of every column every minute:
df.groupby(['reqId', pd.Grouper(key='time', freq='30s')]).mean()
open high low close volume count
reqId time
0 2022-03-16 19:34:00 832.82 833.28000 832.71 833.010 20.0 8.0
1 2022-03-16 19:34:00 158.08 158.14000 158.08 158.130 76.0 43.0
2 2022-03-16 19:34:00 14.90 14.91185 14.90 14.905 36.0 25.5
1 2022-03-16 19:34:00 158.06 158.09000 158.04 158.080 175.0 82.0
After your comment, if I understood correctly, you want (or that is already happening, not sure here) to have your stream_data
function called every 5 seconds and spit out new rows. But you are re-creating every time a dataframe with these rows that come from stream_data
, and you want these lines added to a dataframe that has all the rows that stream_data
has returned before.
You can just create an empty dataframe and then keep appending new rows to it. Then, you filter the rows you want do do your calculations as I suggested above.
df = pd.DataFrame
def stream_data(self, reqId, time, open, high, low, close, volume, wap, count):
new_rows = pd.DataFrame(
[[reqId, reqId, time, open, high, low, close, volume, wap, count]], colums = ['reqId', 'time', 'open', 'high', 'low', 'close', 'volume', 'wap', 'count'])
new_rows['time'] = pd.to_datetime(df['time'],unit='s')
return new_rows
df = pd.concat([df, new_rows])
Upvotes: 0