Gaetano
Gaetano

Reputation: 1

How to change print into DataFrame in Pandas correctly?

I am successfully requested multiple stocks real-time data from realtimeBar from ibapi library. Like this code:

    def stream_data(self, reqId, time, open, high, low, close, volume, wap, count):
        print(reqId, time, open, high, low, close, volume, wap, count)

Output looks like this:

1 1647459265 158.06 158.09 158.04 158.08 175 158.06 82
0 1647459265 832.66 832.86 832.44 832.66 4 832.64 4
2 1647459265 14.89 14.9 14.89 14.9 7 14.896 5
1 1647459270 158.08 158.14 158.08 158.13 76 158.11 43
2 1647459270 14.91 14.9237 14.91 14.91 65 14.9179 46
0 1647459270 832.98 833.7 832.98 833.36 36 833.48 12

And it goes forever until I stopped the script. I have a problem to insert/adapt this print in DataFrame from pandas. How should I do this? Could you show me a snippet, please? I was trying different options but different errors occurred, that's why I don't show my error type here. I want to achieve table like this:

0 1647459265 832.66 832.86 832.44 832.66 4 832.64 4 1 1647459265 158.06 158.09 158.04 158.08 175 158.06 82 2 1647459265 14.89 14.9 14.89 14.9 7 14.896 5
0 1647459270 832.98 833.7 832.98 833.36 36 833.48 12 1 1647459270 158.08 158.14 158.08 158.13 76 158.11 43 2 1647459270 14.91 14.9237 14.91 14.91 65 14.9179 46  

The goal of this DataFrame is to sort real time (every 5 sec) streaming data by reqId (because this is different stocks in fact), and make calculations (like average price, percentage change, etc.) on let's say 10 last bar (recognized by unixtime) I received. Should I use MultiIndex? How? Or multiple DataFrames? Or maybe do you have another method, library, solution? Any help I would highly appreciated :-)

Upvotes: 0

Views: 345

Answers (1)

ibmx
ibmx

Reputation: 173

If I understood correctly, every 5 seconds you get values from reqId, time, open, high, low, close, volume, wap, count. Then, you want to pass them to a dataframe and sort by reqId and put all values with the same time on a single row.

I don't get why you want to combine the values with the same time on a single row, but I suspect you don't need to do it to achieve what you want.

I think this may help:

def stream_data(self, reqId, time, open, high, low, close, volume, wap, count):
   df = pd.DataFrame(
       [[reqId, reqId, time, open, high, low, close, volume, wap, count]], colums = ['reqId', 'time', 'open', 'high', 'low', 'close', 'volume', 'wap', 'count'])
   df['time'] = pd.to_datetime(df['time'],unit='s')
   return df

This will get the stream values into a dataframe and convert the unixtime to a more friendly datetime format.

Then, you can calculate whatever you want for a given frequency. For example, for your example dataframe, we can calculate the mean of every column every minute:


df.groupby(['reqId', pd.Grouper(key='time', freq='30s')]).mean()

                            open    high         low    close   volume  count
reqId   time                        
0      2022-03-16 19:34:00  832.82  833.28000   832.71  833.010 20.0    8.0
1      2022-03-16 19:34:00  158.08  158.14000   158.08  158.130 76.0    43.0
2      2022-03-16 19:34:00  14.90   14.91185    14.90   14.905  36.0    25.5
1      2022-03-16 19:34:00  158.06  158.09000   158.04  158.080 175.0   82.0

After your comment, if I understood correctly, you want (or that is already happening, not sure here) to have your stream_data function called every 5 seconds and spit out new rows. But you are re-creating every time a dataframe with these rows that come from stream_data, and you want these lines added to a dataframe that has all the rows that stream_data has returned before.

You can just create an empty dataframe and then keep appending new rows to it. Then, you filter the rows you want do do your calculations as I suggested above.

df = pd.DataFrame
def stream_data(self, reqId, time, open, high, low, close, volume, wap, count):
   new_rows = pd.DataFrame(
       [[reqId, reqId, time, open, high, low, close, volume, wap, count]], colums = ['reqId', 'time', 'open', 'high', 'low', 'close', 'volume', 'wap', 'count'])
   new_rows['time'] = pd.to_datetime(df['time'],unit='s')
   return new_rows

df = pd.concat([df, new_rows])

Upvotes: 0

Related Questions