Marcel
Marcel

Reputation: 1142

How to download webpage in python, url includes GET parameters

I have an URL such as

http://www.example-url.com/content?param1=1&param2=2

in particular I am testing in on

http://ws.parlament.ch/votes/councillors?concillorNumberFilter=2565&format=json
  1. How do I get the content of such URL, such that the get parameters are considered as well?

  2. How can I save it to file?

  3. How can I access multiple URLs like this either in parallel or asynchronously (saving to file on response received callback like in JavaScript)?

I have tried

import urllib
urllib.urlretrieve("http://ws.parlament.ch/votes/councillors?concillorNumberFilter=2565&format=json", "file.json")

but I am getting a content of http://ws.parlament.ch/votes/councillors instead of the json I want.

Upvotes: 0

Views: 968

Answers (1)

Headhunter Xamd
Headhunter Xamd

Reputation: 606

You can use urllib, but there are other libraries I know of which make it a lot easier in different situations. for example, if you want to also have user authentication done you can use Requests. For this situation you can use httplib2 for example, here is a clean small piece of code which takes the GET into consideration (source).

import httplib2
h = httplib2.Http(".cache")
(resp_headers, content) = h.request("http://example.org/", "GET")

It seems that jou need to set the user agent of the connection otherwise it will refuse to give you the data. I also use the urllib2.Request() instead of the standard urlretrieve() and or urlopen(), mostly because this function allows GET, POST requests and allows the user agent to be set by the programmer.

import urllib2, json

user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
header = { 'User-Agent' : user_agent }

fullurl = "http://ws.parlament.ch/votes/councillors?councillorNumberFilter=2565&format=json"
response = urllib2.Request(fullurl, headers=header)
data = urllib2.urlopen(response)
print json.loads(data.read())

Some extra information about headers in python

if you want to keep using httplib2 here is the code for this one:

import httplib2

header = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)' }

fullurl = "http://ws.parlament.ch/votes/councillors?councillorNumberFilter=2565&format=json"
http = httplib2.Http(".cache")

response, content = http.request(fullurl, "GET", headers=header)
print content

The data printed by my last example can be saved to a file with json.dump(filename, data).

Upvotes: 1

Related Questions