Reputation: 485
In the webpage that I'm scraping, there are a lot of titles and I need to identify them to set one value in my database. The problem is that those titles doesn't have a specific ID or Class.
They follow those pattern:
<p ALIGN="CENTER"><font face="Arial" SIZE="2">
<a name="tituloivcapituloisecaoii"></a><b>
<span style="text-transform: uppercase">Seção II<br>
DAS ATRIBUIÇÕES DO CONGRESSO NACIONAL</span></b></font></p>
<p ALIGN="CENTER"><font face="Arial" SIZE="2"><a name="tituloivcapituloisecaoiii"></a>
<b><span style="text-transform: uppercase">Seção III<br>
DA CÂMARA DOS DEPUTADOS</span></b></font></p>
One attribute that identifies them is: text-trasform: uppercase
.
How can I check if the p
contains one title?
That's my current code:
soup = BeautifulSoup(f, 'html.parser')
for tag in soup.findAll():
if tag.name in ['a', 'strike']:
tag.decompose()
allp = soup.findAll('p')
for p in allp:
print(p)
Upvotes: 0
Views: 32
Reputation:
Once you have parsed the html by tag type, you can search within the tags using any defining attribute. The text-transform:uppercase
can be used in this case.
soup = BeautifulSoup(f, 'html.parser')
for p in soup.find_all("p"):
if p.span["style"]=="text-transform: uppercase":
title=p.text
print(title)
>>>Seção IIDAS ATRIBUIÇÕES DO CONGRESSO NACIONAL
This will find all <p>
tags containing <span>
tags where style=="text-transform: uppercase"
and print their associated text.
Upvotes: 2