Reputation: 17
I'm trying to scrape this page to get the average price per square meter as well as the bracket. I've overcome my first obstacle (cf. page by using select instead of findAll) but now I can't get the wrong results. Indeed, I want to get the <ul><li> elements with my figures but I get into other <ul><li> elements (see images below).
I know that it has something to do with child nodes and the little arrows behind the <li> tag but I can't figure it out... So what could I do to get the text "2 992 €" and the bracket text " 1962 € à 4 158 €" ???
Here is my code
import requests
from bs4 import BeautifulSoup as bs
res=requests.get("https://www.meilleursagents.com/prix-immobilier/marseille-13000/")
soup=bs(res.text,"html.parser")
infos=soup.select("li",class_="big-number")
print(infos)
Upvotes: 0
Views: 602
Reputation: 84465
It's looking for a valid browser ua possibly from a list specified on the server, and also there is unicode to handle
import requests
from bs4 import BeautifulSoup
import unicodedata
import re
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'}
r = requests.get('https://www.meilleursagents.com/prix-immobilier/marseille-13000/', headers = headers)
soup = BeautifulSoup(r.text, "lxml")
for i in soup.select('.prices-summary__price-range'):
print([re.sub('\n\s+', '', unicodedata.normalize('NFKD', j.text.strip())) for j in i.select('li:nth-child(n+2):nth-child(-n+3)')])
Upvotes: 1
Reputation: 8302
Here is a solution you can give a try using the parent tag ul
instead of li
.
for ul in soup.find_all("ul", {"class": "prices-summary__price-range"}):
for li in ul.find_all("li"):
if li.string:
print(li.string.strip())
Prix m2 moyen
2 168 €
de
1 421 €
à
3 011 €
....
Upvotes: 0
Reputation: 488
Go to the dev tools, and select the element. Then click copy as css selector, and the browser automatically gives you the correct css selector. Or, you can use the xpath.
Upvotes: 0