How to identify css inline attribute

Question

In the webpage that I'm scraping, there are a lot of titles and I need to identify them to set one value in my database. The problem is that those titles doesn't have a specific ID or Class.

They follow those pattern:



Seção II

DAS ATRIBUIÇÕES DO CONGRESSO NACIONAL



Seção III

DA CÂMARA DOS DEPUTADOS

One attribute that identifies them is: text-trasform: uppercase.

How can I check if the p contains one title?

That's my current code:

soup = BeautifulSoup(f, 'html.parser')
for tag in soup.findAll():
    if tag.name in ['a', 'strike']:
      tag.decompose()

allp = soup.findAll('p')
for p in allp:          
   print(p)

user10597469 · Accepted Answer

Once you have parsed the html by tag type, you can search within the tags using any defining attribute. The text-transform:uppercase can be used in this case.

soup = BeautifulSoup(f, 'html.parser')
for p in soup.find_all("p"):
    if p.span["style"]=="text-transform: uppercase":
        title=p.text
        print(title)

>>>Seção IIDAS ATRIBUIÇÕES DO CONGRESSO NACIONAL

This will find all

tags containing tags where style=="text-transform: uppercase" and print their associated text.

How to identify css inline attribute

Answers (1)

Related Questions