swifty
swifty

Reputation: 1282

Diagnosing intermittent/"lagged" Websocket data collection

I'm collecting trade data from a wide variety of cryptocurrency exchanges via websockets and then storing them into .csv files. This has been working for all exchanges for two weeks now except for my Bitfinex script below.

import websocket
import pandas as pd
import json
import time
import datetime
import os

df = pd.DataFrame(columns=['id','time','amount','price'])
trades = []
folder = r'/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_'
def on_message(ws, message):
    msg = json.loads(message)
    if msg[1] == 'te':
        print("Bitfinex BTCUSD Trades - "+str(msg[2]))
        global df
        trades.append(msg[2])
        df = pd.DataFrame(trades)
        df = df[-1:]
        df = df.drop(0,axis=1)
        if not os.path.isfile(folder + str(datetime.datetime.today().strftime('%Y_%m_%d') + '.csv')):
           df.to_csv(folder + str(datetime.datetime.today().strftime('%Y_%m_%d') + '.csv'),header ='column_names', index=False)
        else: # else it exists so append without writing the header
            df.to_csv(folder + str(datetime.datetime.today().strftime('%Y_%m_%d') + '.csv'),mode = 'a',header=False, index=False)

def on_error(ws, error):
    print(error)

def on_close(ws):
    print("### closed ###")

def on_open(ws):
    ws.send(json.dumps({"event":"subscribe", "channel":"trades", "pair":"BTCUSD"}))

while True:
    if __name__ == "__main__":
        ws = websocket.WebSocketApp("wss://api.bitfinex.com/ws/2",
                                    on_message=on_message,
                                    on_error=on_error,
                                    on_close=on_close)
        ws.on_open = on_open
        ws.run_forever()

The output of this over multiple days looks like this As you can see after running perfectly fine for the first 3.5 days, it begins to intermittently miss hours at a time and I'm at a loss as to why or how to fix it especially considering my other identical scripts to other exchanges are all working.

As it is infrequent and can go hours working I don't have example errors/messages/clues as to what is happening when it isn't recording. I will provide as much information as I can.

I'm hoping someone with more experience might have a suggestion as to how to fix this. I don't mind missing a few seconds here and there if there is a hacky workaround. Could I just automatically close/restart the script every 24h? How would I go about doing that?

Thanks.

Edit

Upvotes: 2

Views: 575

Answers (1)

Douglas Lopez
Douglas Lopez

Reputation: 396

Well, the network performance of the t2.medium instances is very poor, and the CPU usage is burstable so if you use all your credits the performance goes down, and the system becomes slow. Consider using an instance with better performance.

To diagnosticate the package loss issue the best way is to activate the vpc-flow-logs this is like a sniffer put in your network so you could track all the packages to do it see https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/flow-logs.html

Upvotes: 3

Related Questions