zerocool
zerocool

Reputation: 369

Subset a bs4.element.ResultSet by string match

I am trying to reference a certain unnamed <p> Paragraph in html. I have identified the correct <div>, which is stored in res.

What I would like to do now is: Find the <p> which <span> contains the word "Country" and receive the value (here="Germany") as a character. Position matching doesn't work unfortunately, since position changes over pages.

res = res.find(name="div", attrs={"class": "item-result-text"}).find_all(name="p")
print(res)

Result:

[<p><span>Country: </span>Germany</p>, <p><span>Coly: </span> Print</p>]
print(type(res))

Result: <class 'bs4.element.ResultSet'>

Upvotes: 0

Views: 467

Answers (1)

bigbounty
bigbounty

Reputation: 17368

I would try to do this in a different way though. I'm not sure if this is exactly what you are looking for

In [5]: a = "<p><span>Country: </span>Germany</p>, <p><span>Coly: </span> Print</p>"
In [6]: soup = BeautifulSoup(a,"lxml")
In [14]: values = {i.text.strip().split(":")[0].strip():i.text.strip().split(":")[1].strip() for i in soup.find_all("p")}

In [15]: values["Country"]
Out[15]: 'Germany'

Upvotes: 1

Related Questions