Reputation: 1227
I was going to scrape stock data from the stooq.pl webpage. I found out that they are updated live, so figured there must be some AJAX request to do it, and it would be overall easier to just analyze this XHR instead of scraping the webpage every ~1s for new data. However what I found was a strange request (the only one existing), that: 1. (almost) never ends; 2. when opened directly in a browser, returns ERR_EMPTY_RESPONSE
. Nevertheless, somehow the data on the webpage gets updated. I tried to reverse-engineer the minified js, but nothing caught my attention. What kind of sorcery is this, and can I make it work as intended?
Below is the URL of the sample webpage I tested, the screenshot of the request in work and the request data from chrome dev-tools:
http://stooq.pl/q/?s=eurpln&c=10d&t=l&a=ln&b=0
**General**
Remote Address:178.32.86.87:80
Request URL:http://aq.stooq.net/?q=aqdat1+wig201+eurpln3+grl1+cig1+usdpln1+chfpln1+eurusd1+gbppln1
Request Method:POST
Status Code:200 OK
**Response Headers**
HTTP/1.1 200 OK
Date: Thu, 01 Oct 2015 09:37:25 GMT
Server: Apache
Expires: Sat, 1 Jan 2000 12:00:00 GMT
Cache-Control: no-store, no-cache, must-revalidate
Pragma: no-cache
Access-Control-Allow-Origin: *
Keep-Alive: timeout=3
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/event-stream
**Request Headers**
POST /?q=aqdat1+wig201+eurpln3+grl1+cig1+usdpln1+chfpln1+eurusd1+gbppln1 HTTP/1.1
Host: aq.stooq.net
Connection: keep-alive
Content-Length: 0
Accept: text/event-stream
Origin: http://stooq.pl
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36
Content-type: application/x-www-form-urlencoded
Referer: http://stooq.pl/q/?s=eurpln&c=10d&t=l&a=ln&b=0
Accept-Encoding: gzip, deflate
Accept-Language: pl,en-US;q=0.8,en;q=0.6,es;q=0.4
**Query String Parameters**
q=aqdat1+wig201+eurpln3+grl1+cig1+usdpln1+chfpln1+eurusd1+gbppln1
Upvotes: 1
Views: 258
Reputation: 6561
This is your clue:
Content-Type: text/event-stream
It's a standardized way to stream data from the server over a plain old HTTP connection.
https://html.spec.whatwg.org/multipage/comms.html#server-sent-events
Upvotes: 3