Mattia_DL
Mattia_DL

Reputation: 23

How to avoid 403 problem using BeautifulSoup and headers?

I am using the combination of request and beautifulsoup to develop a web-scraping program in python. Unfortunately, I got 403 problem (even using header). Here my code:

from bs4 import BeautifulSoup
from requests import get

headers_m = ({'User-Agent':
            'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'})
sapo_m = "https://www.idealista.it/vendita-case/milano-milano/"

response_m = get(sapo_m, headers=headers_m)    

Upvotes: 2

Views: 1176

Answers (2)

Daniel Danielecki
Daniel Danielecki

Reputation: 10580

Simply use Chrome as User-Agent.

from bs4 import BeautifulSoup
BeautifulSoup(requests.get("https://...", headers={"User-Agent": "Chrome"}).content, 'html.parser')

Upvotes: 0

yascool
yascool

Reputation: 41

This is not general python question. The site blocks such straightforward attempts of scraping, you need to find a set of headers (specific for this site) that will pass validation.

Regards,

Upvotes: 1

Related Questions