get request returns 403 status code even after using header

Question

I'm trying to scrape data from autotrader page and I managed to grab link to every offer on that page but when I'm trying to get data from every offer I get 403 requests status even though I'm using a header. What more can I do to get past it?

headers = {"User Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) '
                     'Chrome/85.0.4183.121 Safari/537.36'}
page = requests.get("https://www.autotrader.co.uk/car-details/202010145012219", headers=headers)
print(page.status_code) # 403 forbidden
content_of_page = page.content
soup = bs4.BeautifulSoup(content_of_page, 'lxml')
title = soup.find('h1', {'class': 'advert-heading__title atc-type-insignia atc-type-insignia--medium '})
print(title.text)

[for people that are in the same position: autotrader uses cloudflare to protect every "car-detail" page, so I would suggest using selenium for example]

Kroshka Kartoshka · Accepted Answer

If you can manage to get the data via your browser, i.e. you somehow see this data in a website, then you can likely replicate that with requests.

Briefly, you need headers in your request to match the Browser's request:

Open dev tools in you browser (e.g. F12 or cmd+opt+I or click on menu)
Open Network tab
Reload the page (the whole website or the target request's url only, whatever provides a desired response from the server)
Find a http request to the desired url in the Network tab. Right click it, click 'Copy...', and choose the option (e.g. curl) you need.

Your browser sends tons of extra headers, you never know which ones are actually checked by the server so this technique will save you much time.

However, this might fail if there's some protection against blunt request copies, e.g. some temporary tokens, so the requests cannot be reused. In this case you need Selenium (browser emulation/automation), it's not difficult so it worth using.

get request returns 403 status code even after using header

Answers (1)

Related Questions