extract data from https site into python using urllib (your request cannot be completed error)

Question

I've been attempting to extract the contents of a https website into python using urllib. I've used 4 lines of code.

import urllib
fhand = urllib.urlopen('https://www.tax.service.gov.uk/view-my-valuation/list-valuations-by-postcode?postcode=w1a&startPage=1#search-results')

for line in fhand:
    print line.strip()

The connection appears to be working as the page is being opened from python. However I'm getting a few different error messages in my output in the title, heading and paragraph headings as below. I had expected the output to be a series of html tags containing the data that is available on the website such as address, base rates and case number (ie the html that is available if I go into the elements on google chrome developer). Can anyone guide me towards getting this data into python please?

Thank & Regards





Your request cannot be completed - GOV.UK


















Sorry, there was a problem handling your request.
Please try again shortly.

Amin Etesamian · Accepted Answer

Some website block requests when user-agent is not specified or is not desirable for them. So try adding the user-agent in the headers of your request

import urllib2


headers = {'User-Agent': 'Mozilla/5.0'}
url = 'https://www.tax.service.gov.uk/view-my-valuation/list-valuations-by-postcode?postcode=w1a&startPage=1#search-results'
req = urllib2.Request(url, headers=HEADERS)
f = urllib2.urlopen(req)
s = f.read()
print s
f.close()

or alternatively you can pip install requests and use print(requests.get(url).text)

extract data from https site into python using urllib (your request cannot be completed error)

Answers (1)

Related Questions