cravoms
cravoms

Reputation: 3

Error 404 with Beautifulsoup only in some urls within a site

I've been learning scraping with python and beautifulsoup, but i recently ran into an issue when requesting the second page of results within a site.

Requesting the first page with this code works correctly:

url = "https://PAGE_1_URL_HERE"
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'
headers = {'User-Agent': user_agent}
response = requests.get(url, headers=headers)
html = response.content
soup = BeautifulSoup(html, features="html.parser")

print(response)

But attempting this in the second page with the same code returns a 404.

url = "https://PAGE_2_URL_HERE"
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'
headers = {'User-Agent': user_agent}
response = requests.get(url, headers=headers)
html = response.content
soup = BeautifulSoup(html, features="html.parser")

print(response)

I've tried different headers but i haven't been able to solve this and i would be very grateful if anyone knew of a solution.

Upvotes: 0

Views: 250

Answers (2)

jose_bacoy
jose_bacoy

Reputation: 12684

Use https instead:

OLD: http://PAGE_2_URL_HERE

NEW: https://PAGE_2_URL_HERE&noscript=false

Upvotes: 0

GiovaniSalazar
GiovaniSalazar

Reputation: 2094

here an example , just you need add your cookie browser...

from bs4 import BeautifulSoup
import requests


url = "https://PAGE_2_URL_HERE"
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'
headers = {'User-Agent': user_agent}
cookies = {"cookie":"COPY_HERE_YOUR_COOKIE_FROM_BROWSER"}
response = requests.get(url, headers=headers , cookies=cookies)
#print(response.text)
print(response)
html = response.content
soup = BeautifulSoup(html, features="html.parser")
print(response)

Upvotes: 1

Related Questions