Reputation: 117
I've been webscraping for a long time and recently decided to scrape a video stream via websocket streaming. I fully understand websockets and how they work, but I don't fully understand the streaming part. I'm trying to scrape a stream where I get base64 data using Python 3.10, and when I try to decode it I find that it can't be read (exactly because it's data from the video stream). The stream I'm trying to extract is from a company that provides some weather data and I need to get that data without needing to use Selenium or some other library for testing. Is there any effective way to do this? Maybe some well performing library, or some way to "read" the data from the stream somehow?
Here is an impression that I took from the data obtained by the websocket:
Even after trying to decode the obtained base64 to utf-8, the result is the same as the image above.
Upvotes: 2
Views: 1128
Reputation: 606
I can recommend this package: https://github.com/websocket-client/websocket-client
It is pretty simple and stable and it works flawlessly. Also it supports asyncio.
def on_message(ws, message):
...
def on_open(ws):
...
def on_close(ws, close_status_code, close_msg):
...
def on_error(ws, error):
...
ws = websocket.WebSocketApp(
"wss://<address>",
on_open=on_open,
on_message=on_message,
on_error=on_error,
on_close=on_close,
)
ws.run_forever()
Usually when scraping WS you need to initiate the proccess by sending some command (you can track it by Dev Tools also, this package will be marked as green up arrow). Then you can reproduce it by using ws.send("<message>")
Upvotes: 8