Python3 : BeautifulSoup4 not returning expected value

Question

I'm currently trying to scrap some data over a website using BS4 under python 3.6.4 but the value returned is not what I am expecting:

import requests
from bs4 import BeautifulSoup

link = "https://www.lacentrale.fr/listing?makesModelsCommercialNames=FERRARI&sortBy=priceAsc"
request = requests.get(link)
page = request.content
soup = BeautifulSoup(page, "html5lib")

price = soup.find("div", {"class" : "fieldPrice sizeC"}).text

print(price)

I should get "39 900 €" but the code return "47Â 880Â â¬".

NB: Even without JS, the data should be "39 900 €".

Thanks for your help !

Dan-Dev · Accepted Answer

The encoding declaration is wrong on this page so BeautifulSoup gets told to use the wrong encoding. You can force it to use the correct encoding like this:

import requests
from bs4 import BeautifulSoup

link = "https://www.lacentrale.fr/listing?makesModelsCommercialNames=FERRARI&sortBy=priceAsc"
request = requests.get(link)
page = request.content
soup = BeautifulSoup(page.decode('utf-8','ignore'), "html5lib")

price = soup.find("div", {"class": "fieldPrice sizeC"}).text

print(price)

Outputs:

49 070 €

Python3 : BeautifulSoup4 not returning expected value

Answers (2)

Related Questions