Reputation: 75
I am trying to scrape the following page: http://usbcdirectory.com/listing/1-us-black-chambers
I am using Python 3.5.0
Here is my code:
urllib.request.urlopen('http://usbcdirectory.com/listing/1-us-black-chambers')
Using the above I am getting a 404 not found error. However, the page exists when I open it from the browser.
I tried searching solution to this problem and here is what I have found:
>>>requests.get('http://usbcdirectory.com/listing/1-us-black-chambers')
Request <404>
I checked my link which is correct
I tried to find out if the page is generated using JavaScript. I believe it is not.
What is the issue with the web page here? Are they blocking scraping in some way or it is an issue with the URL?
Upvotes: 3
Views: 6050
Reputation: 1
It happend for me the same. thanks for sharing the solution. I also tried to use my personal User-Agent code and it worked. I used this code:
import requests
url = 'http://usbcdirectory.com/listing/1-us-black-chambers'
headers = {'User-Agent': 'your user agent'}
response = requests.get(url, headers=headers)
print(response.status_code)
Upvotes: 0
Reputation: 2817
As you guessed, they are probably blocking your request. You can pass custom headers to simulate your request more like a request from a real browser:
import requests
url = 'http://usbcdirectory.com/listing/1-us-black-chambers'
headers = {'Accept': 'text/html'}
response = requests.get(url, headers=headers)
print(response.status_code)
Upvotes: 6