Reputation: 53
As the title above states I am getting a 403 error. The URLs generated are valid, I can print them and then open them in my browser just fine. I get the whole Request Headers and still 403 Forbidden, can someone help me to solve it ?
import requests
from bs4 import BeautifulSoup
header = {
"sec-ch-ua": '" Not A;Brand";v="99", "Chromium";v="99", "Microsoft Edge";v="99"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "Windows",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36 Edg/99.0.1150.30"
}
url="https://www.nadirkitap.com/"
get = requests.get(url,headers=header)
print(get.status_code)
Upvotes: 0
Views: 1095
Reputation: 25196
Take a look in the response text - It tells site is protected by cloudflare and you will need to activate JavaScript. Cause requests
do not deal with this, you could use selenium
instead.
Includes BeautifulSoup
object based on driver.page_source
and prints a list of book titles:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
service = Service('PATH TO YOUR CHROMEDRIVER')
driver = webdriver.Chrome(service=service)
driver.get('https://www.nadirkitap.com/')
soup = BeautifulSoup(driver.page_source)
print([t['title'] for t in soup.select('a[title]')])
['İkinci el kitap, yeni kitap, dergi, efemera', 'İkinci el kitap, yeni kitap, dergi, efemera', 'İkinci el kitap ve yeni kitap', 'Bilim ve Teknik Kitapları', 'Çizgi Roman Kitapları', 'Çocuk Kitapları', 'Dini Kitaplar', 'Edebiyat Kitapları', 'Ekonomi ve İş Dünyası Kitapları', 'Felsefe Kitapları', 'Hukuk Kitapları', 'Osmanlıca Kitaplar',...]
Upvotes: 2