Reputation: 3
I've been learning scraping with python and beautifulsoup, but i recently ran into an issue when requesting the second page of results within a site.
Requesting the first page with this code works correctly:
url = "https://PAGE_1_URL_HERE"
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'
headers = {'User-Agent': user_agent}
response = requests.get(url, headers=headers)
html = response.content
soup = BeautifulSoup(html, features="html.parser")
print(response)
But attempting this in the second page with the same code returns a 404.
url = "https://PAGE_2_URL_HERE"
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'
headers = {'User-Agent': user_agent}
response = requests.get(url, headers=headers)
html = response.content
soup = BeautifulSoup(html, features="html.parser")
print(response)
I've tried different headers but i haven't been able to solve this and i would be very grateful if anyone knew of a solution.
Upvotes: 0
Views: 250
Reputation: 12684
Use https instead:
OLD: http://PAGE_2_URL_HERE
NEW: https://PAGE_2_URL_HERE&noscript=false
Upvotes: 0
Reputation: 2094
here an example , just you need add your cookie browser...
from bs4 import BeautifulSoup
import requests
url = "https://PAGE_2_URL_HERE"
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'
headers = {'User-Agent': user_agent}
cookies = {"cookie":"COPY_HERE_YOUR_COOKIE_FROM_BROWSER"}
response = requests.get(url, headers=headers , cookies=cookies)
#print(response.text)
print(response)
html = response.content
soup = BeautifulSoup(html, features="html.parser")
print(response)
Upvotes: 1