Reputation: 325
I would like to download a pgn text file from this URL: http://www.chess.com/echess/download_pgn?lid=1222621131. I have the following (redacted) code which is supposed to do this, but it is downloading a html page instead.What could I be doing wrong?
package main
import (
"fmt"
"io"
"log"
"net/http"
"os"
)
func main() {
url := "http://www.chess.com/echess/download_pgn?lid=1222621131"
filename := "game.pgn"
resp, err := http.Get(url)
...
file, err := os.Create(filename)
defer file.Close()
...
size, err := io.Copy(file, resp.Body)
}
Upvotes: 0
Views: 2124
Reputation: 5407
First guess is that you've failed to supply all of the normal auth, cookies, and headers a browser session normally would supply. As an experiment, open up Chrome in Incognito mode, then open your developer tools, now in that window hit the URL you GET above. When I do this I look at the first GET in the Network tab in Chrome. Notice the request and response details below. Pay attention to the response code of 302 which means it is found, but you are being redirected. Now take a look for the Location header. It reads '/login'. I suspect this is the very page your code is downloading since your Go program does not have the login session/cookies for this site like your browser does.
There's a lot of work our browsers do to navigate a website. Coding that up from scratch can be a bit of work. You have to pay attention to cookies, authentication, headers, redirects, and more.
Remote Address:174.35.7.172:80 Request URL:http://www.chess.com/echess/download_pgn?lid=1222621131 Request Method:GET Status Code:302 Found Response Headers view parsed HTTP/1.1 302 Found Date: Sat, 25 Jul 2015 20:49:43 GMT Server: PWS/8.1.20.22 X-Px: ms h0-s1027.p12-sjc ( origin) P3P: CP="ALL DSP COR LAW CURa ADMa DEVa TAIa OUR BUS IND ONL UNI COM NAV DEM CNT" Cache-Control: private Pragma: no-cache Expires: Thu, 19 Nov 1981 08:52:00 GMT Content-Length: 0 Content-Type: text/html; charset=utf-8 Location: /login Connection: keep-alive Set-Cookie: PHPSESSID=pach18her77q4asgsq2heohvj1; path=/; domain=.chess.com; HttpOnly Request Headers view parsed GET /echess/download_pgn?lid=1222621131 HTTP/1.1 Host: www.chess.com Connection: keep-alive Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36 Accept-Encoding: gzip, deflate, sdch Accept-Language: en-US,en;q=0.8,es;q=0.6 Query String Parameters view source view URL encoded lid:1222621131
Upvotes: 2