Reputation: 4387
I'm trying to scrape websocket datas(frames) from a website using sockjs, in Python but I don't really know how to do that.
URL: ws://example.io/sockjs/wkzeza/websocket
In the web debugger I can see this response headers:
Date: Sun, 27 Aug 2017 09:42:15 GMT
Connection: upgrade
Set-Cookie: oWG+Kel2MBo0v9FQK81NvuvBZcUwChaMvG2bsv1Ofs9Q8hHN+PlTn6PolO/8MgFXh2ygqC7A8WsJ7cgZwvpwvsbSp0VCpRHqmYMhGGxr; Expires=Sun, 03 Sep 2017 09:42:15 GMT; Path=/
Upgrade: websocket
Sec-WebSocket-Accept: HA0gkvrFCF7qjVYIDvSBa5sJKkg=
Sec-WebSocket-Extensions: permessage-deflate
Server: nginx
CF-RAY: 394e146d34a12f65-MAD
Normally with only the response header I can retrieve the datas from the frames, right?
I've tried with this code but I can read the content:
from websocket import create_connection
import json
headers = json.dumps({'Date': 'Sun, 27 Aug 2017 09:42:15 GMT',
'Connection': 'upgrade',
'Set-Cookie': 'oWG+Kel2MBo0v9FQK81NvuvBZcUwChaMvG2bsv1Ofs9Q8hHN+PlTn6PolO/8MgFXh2ygqC7A8WsJ7cgZwvpwvsbSp0VCpRHqmYMhGGxr; Expires=Sun, 03 Sep 2017 09:42:15 GMT; Path=/',
'Upgrade': 'websocket',
'Sec-WebSocket-Accept': 'HA0gkvrFCF7qjVYIDvSBa5sJKkg=',
'Sec-WebSocket-Extensions': 'permessage-deflate',
'Server': 'nginx',
'CF-RAY': '394e146d34a12f65-MAD'})
ws = create_connection('ws://example.io/sockjs/wkzeza/websocket', header=headers)
response = ws.recv_data_frame()
print(response)
>> [1, <websocket._abnf.ABNF at 0x7efe29aa0da0>]
Thanks for your help.
Upvotes: 5
Views: 11443
Reputation: 8051
Check the traffic in Chrome (or other browser) to see how you should negotiate to beginning the flow of data. When the negotiation is OK you can do something like:
while True:
ws.recv()
Here is an example for up/down WebSocket-traffic in Chrome.
Just copy the message up and use it in ws.send(). Example:
ws.send('''{"H":"publicmaphub","M":"getData","A":[],"I":1}''')
The example is from this live view of buses in Norway/Stavanger:
https://www.kolumbus.no/ruter/kart/sanntidskart-internt/?c=58.974238,5.691347,14&lf=all&vt=bus,ferry
(On that page you also need to first get a token through HTTPS, connect with WebSocket and do another HTTPS to start the traffic. After this you can do the ws.recv() and ws.send() combos to start getting data.)
Upvotes: 7