Mayank Mittal
Mayank Mittal

Reputation: 75

Web Scraping: Page exists but getting 404 using requests/urllib

I am trying to scrape the following page: http://usbcdirectory.com/listing/1-us-black-chambers

I am using Python 3.5.0

Here is my code:

urllib.request.urlopen('http://usbcdirectory.com/listing/1-us-black-chambers')

Using the above I am getting a 404 not found error. However, the page exists when I open it from the browser.

I tried searching solution to this problem and here is what I have found:

  1. Change urllib to requests: I already did this and got a 404 error in the status code
>>>requests.get('http://usbcdirectory.com/listing/1-us-black-chambers')
    
Request <404>
  1. I checked my link which is correct

  2. I tried to find out if the page is generated using JavaScript. I believe it is not.

What is the issue with the web page here? Are they blocking scraping in some way or it is an issue with the URL?

Upvotes: 3

Views: 6050

Answers (2)

It happend for me the same. thanks for sharing the solution. I also tried to use my personal User-Agent code and it worked. I used this code:

import requests

url = 'http://usbcdirectory.com/listing/1-us-black-chambers'
headers = {'User-Agent': 'your user agent'}
response = requests.get(url, headers=headers)
print(response.status_code)

Upvotes: 0

ritiek
ritiek

Reputation: 2817

As you guessed, they are probably blocking your request. You can pass custom headers to simulate your request more like a request from a real browser:

import requests

url = 'http://usbcdirectory.com/listing/1-us-black-chambers'
headers = {'Accept': 'text/html'}
response = requests.get(url, headers=headers)
print(response.status_code)

Upvotes: 6

Related Questions