jax
jax

Reputation: 840

Bad request trying to scrape page using Python 3

I am trying to to scrape the following page using python 3 but I keep getting HTTP Error 400: Bad Request. I have looked at some of the previous answers suggesting to use urllib.quote which didn't work for me since it's python 2. Also, I tried the following code as suggested by another post and still didn't work.

url = requote_uri('http://www.txhighereddata.org/Interactive/CIP/CIPGroup.cfm?GroupCode=01')
with urllib.request.urlopen(url) as response:
  html = response.read()

Upvotes: 0

Views: 393

Answers (1)

Arount
Arount

Reputation: 10431

The server deny queries from non human-like User-Agent HTTP header.

Just pick a browser's User-Agent string and set it as header to your query:

import urllib.request

url = 'http://www.txhighereddata.org/Interactive/CIP/CIPGroup.cfm?GroupCode=01'
headers={
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0"
}

request = urllib.request.Request(url, headers=headers)
with urllib.request.urlopen(request) as response:
    html = response.read()

Upvotes: 2

Related Questions