Reputation: 41
I am trying to grab the information from the html that has text in it. When I am using the
find('dl', class_="definition-list")
or
findNext('dl', class_="definition-list")
it returns nothing.
An example of a website is (it is looped, to move through several): https://www.finn.no/realestate/homes/ad.html?finnkode=216521178
I am trying to get only this part only;
Upvotes: 1
Views: 24
Reputation: 20052
The price data is in the third div
of class panel
. You can easily get that with .find_all()
.
Here's how:
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
'AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/78.0.3904.108 Safari/537.36',
}
product_url = "https://www.finn.no/realestate/homes/ad.html?finnkode=216521178"
page_content = requests.get(product_url).content
soup = BeautifulSoup(page_content, 'lxml').find_all("div", class_="panel")[2]
print("\n".join(soup.getText(strip=True, separator="|").split("|")))
Output:
Prisantydning
3 790 000 kr
Fellesgjeld
72 827 kr
Omkostninger
108 120 kr
Totalpris
3 970 947 kr
Felleskost/mnd.
3 128 kr
This works with any URL on that page. :-]
Upvotes: 1