Reputation: 14039
Inside a page, I have the following HTML
<div class="ProfileDesc">
<p>
<span class="Title">Name</span>
<span>Tom Ready</span>
</p>
<p>
<span class="Title">Born</span>
<span>
<bxi> 10 Jan 1960</bxi>
<p>
<span class="Title">Death</span>
<span>
<bxi> 01 Jun 2019</bxi>
</span>
</p>
</div>
The following code works for extracting the ProfileDesc from the whole page
soup = BeautifulSoup(page.content, 'html.parser')
mydivs = soup.find("div", {"class": "ProfileDesc"})
I want the following output
Name: Tom Ready
Born: 10 Jan 1960
Death: 01 Jun 2019
How do I extract these after finding the ProfileDesc?
Upvotes: 0
Views: 56
Reputation: 2256
When you're pretty sure about the DOM structure:
mydivs = soup.find("div", {"class": "ProfileDesc"})
for element in mydivs.find_all("p"):
title = element.find("span")
content = title.findNext("span")
print("%s : %s" % (title.text.strip(), content.text.strip()))
Output:
Name : Tom Ready
Born : 10 Jan 1960
Death : 01 Jun 2019
Upvotes: 1
Reputation: 116
Your html code after " 10 Jan 1960 " has no end p tag
name = soup.find('span',string='Name').parent.text.replace('Name','').strip()
born = soup.find('span',string='Born').parent.text.replace('Born','').strip()
death = soup.find('span',string='Death').parent.text.replace('Death','').strip()
print(f'Name: {name}')
print(f'Born: {born}')
print(f'Death: {death}')
Upvotes: 2
Reputation: 8302
try this,
keys_ = set() # avoid duplicate keys
for p in mydivs.find_all("p"):
ss = list(p.stripped_strings)
for k, v in zip(ss[::2], ss[1::2]):
if k in keys_:
continue
keys_.add(k)
print(k, ":", v)
Name : Tom Ready
Born : 10 Jan 1960
Death : 01 Jun 2019
Upvotes: 1