Anthony
Anthony

Reputation: 45

Python3 : BeautifulSoup4 not returning expected value

I'm currently trying to scrap some data over a website using BS4 under python 3.6.4 but the value returned is not what I am expecting:

import requests
from bs4 import BeautifulSoup

link = "https://www.lacentrale.fr/listing?makesModelsCommercialNames=FERRARI&sortBy=priceAsc"
request = requests.get(link)
page = request.content
soup = BeautifulSoup(page, "html5lib")

price = soup.find("div", {"class" : "fieldPrice sizeC"}).text

print(price)

I should get "39 900 €" but the code return "47 880 â¬".

NB: Even without JS, the data should be "39 900 €".

Thanks for your help !

Upvotes: 1

Views: 121

Answers (2)

Rakesh
Rakesh

Reputation: 82755

Instead of page.content use page.text

Ex:

import requests
from bs4 import BeautifulSoup

link = "https://www.lacentrale.fr/listing?makesModelsCommercialNames=FERRARI&sortBy=priceAsc"
request = requests.get(link)
page = request.text
soup = BeautifulSoup(page, "html.parser")

price = soup.find("div", {"class" : "fieldPrice sizeC"}).text

print(price)
  • .text automatically decode content from the server

Upvotes: 1

Dan-Dev
Dan-Dev

Reputation: 9420

The encoding declaration is wrong on this page so BeautifulSoup gets told to use the wrong encoding. You can force it to use the correct encoding like this:

import requests
from bs4 import BeautifulSoup

link = "https://www.lacentrale.fr/listing?makesModelsCommercialNames=FERRARI&sortBy=priceAsc"
request = requests.get(link)
page = request.content
soup = BeautifulSoup(page.decode('utf-8','ignore'), "html5lib")

price = soup.find("div", {"class": "fieldPrice sizeC"}).text

print(price)

Outputs:

49 070 €

Upvotes: 2

Related Questions