Matt
Matt

Reputation: 4387

Python scraping Websocket datas

I'm trying to scrape websocket datas(frames) from a website using sockjs, in Python but I don't really know how to do that.

URL: ws://example.io/sockjs/wkzeza/websocket

In the web debugger I can see this response headers:

Date: Sun, 27 Aug 2017 09:42:15 GMT
Connection: upgrade
Set-Cookie: oWG+Kel2MBo0v9FQK81NvuvBZcUwChaMvG2bsv1Ofs9Q8hHN+PlTn6PolO/8MgFXh2ygqC7A8WsJ7cgZwvpwvsbSp0VCpRHqmYMhGGxr; Expires=Sun, 03 Sep 2017 09:42:15 GMT; Path=/
Upgrade: websocket
Sec-WebSocket-Accept: HA0gkvrFCF7qjVYIDvSBa5sJKkg=
Sec-WebSocket-Extensions: permessage-deflate
Server: nginx
CF-RAY: 394e146d34a12f65-MAD

Normally with only the response header I can retrieve the datas from the frames, right?

I've tried with this code but I can read the content:

from websocket import create_connection
import json

headers = json.dumps({'Date': 'Sun, 27 Aug 2017 09:42:15 GMT',
'Connection': 'upgrade',
'Set-Cookie': 'oWG+Kel2MBo0v9FQK81NvuvBZcUwChaMvG2bsv1Ofs9Q8hHN+PlTn6PolO/8MgFXh2ygqC7A8WsJ7cgZwvpwvsbSp0VCpRHqmYMhGGxr; Expires=Sun, 03 Sep 2017 09:42:15 GMT; Path=/',
'Upgrade': 'websocket',
'Sec-WebSocket-Accept': 'HA0gkvrFCF7qjVYIDvSBa5sJKkg=',
'Sec-WebSocket-Extensions': 'permessage-deflate',
'Server': 'nginx',
'CF-RAY': '394e146d34a12f65-MAD'})

ws = create_connection('ws://example.io/sockjs/wkzeza/websocket', header=headers)
response = ws.recv_data_frame()
print(response)

>> [1, <websocket._abnf.ABNF at 0x7efe29aa0da0>]

Thanks for your help.

Upvotes: 5

Views: 11443

Answers (1)

Punnerud
Punnerud

Reputation: 8051

Check the traffic in Chrome (or other browser) to see how you should negotiate to beginning the flow of data. When the negotiation is OK you can do something like:

while True:
    ws.recv()

Here is an example for up/down WebSocket-traffic in Chrome.

enter image description here

Just copy the message up and use it in ws.send(). Example:

ws.send('''{"H":"publicmaphub","M":"getData","A":[],"I":1}''')

The example is from this live view of buses in Norway/Stavanger: https://www.kolumbus.no/ruter/kart/sanntidskart-internt/?c=58.974238,5.691347,14&lf=all&vt=bus,ferry
(On that page you also need to first get a token through HTTPS, connect with WebSocket and do another HTTPS to start the traffic. After this you can do the ws.recv() and ws.send() combos to start getting data.)

Upvotes: 7

Related Questions