mr.abdo
mr.abdo

Reputation: 485

How to identify css inline attribute

In the webpage that I'm scraping, there are a lot of titles and I need to identify them to set one value in my database. The problem is that those titles doesn't have a specific ID or Class.

They follow those pattern:

<p ALIGN="CENTER"><font face="Arial" SIZE="2">
<a name="tituloivcapituloisecaoii"></a><b>
<span style="text-transform: uppercase">Seção II<br>
DAS ATRIBUIÇÕES DO CONGRESSO NACIONAL</span></b></font></p>


<p ALIGN="CENTER"><font face="Arial" SIZE="2"><a name="tituloivcapituloisecaoiii"></a>
<b><span style="text-transform: uppercase">Seção III<br>
DA CÂMARA DOS DEPUTADOS</span></b></font></p>

One attribute that identifies them is: text-trasform: uppercase.

How can I check if the p contains one title?

That's my current code:

soup = BeautifulSoup(f, 'html.parser')
for tag in soup.findAll():
    if tag.name in ['a', 'strike']:
      tag.decompose()

allp = soup.findAll('p')
for p in allp:          
   print(p)

Upvotes: 0

Views: 32

Answers (1)

user10597469
user10597469

Reputation:

Once you have parsed the html by tag type, you can search within the tags using any defining attribute. The text-transform:uppercase can be used in this case.

soup = BeautifulSoup(f, 'html.parser')
for p in soup.find_all("p"):
    if p.span["style"]=="text-transform: uppercase":
        title=p.text
        print(title)

>>>Seção IIDAS ATRIBUIÇÕES DO CONGRESSO NACIONAL

This will find all <p> tags containing <span> tags where style=="text-transform: uppercase" and print their associated text.

Upvotes: 2

Related Questions