Sarotobi
Sarotobi

Reputation: 831

Python bs4 how to find inline style from find_all()

I am attempting to get the height value of an inline style element using python and BeautifulSoup i manage to get all div with a specific class but cannot figure out how to get the inline style=height value of the output below is my code so far.

import requests
from bs4 import BeautifulSoup

URL = "https://exampleonly.org/"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")
samples = soup.find_all("div", {"class": "example"})

for sample in samples:
    print(sample)

and my output is below

<div class="example" style="height: 50%;"></div>
<div class="example" style="height: 20%;"></div>
<div class="example" style="height: 40%;"></div>
 

Now what i want to get is the 50%,20% and 40% values.

Upvotes: 0

Views: 297

Answers (2)

from bs4 import BeautifulSoup
html = '''<div class="example" style="height: 50%;"></div>
<div class="example" style="height: 20%;"></div>
<div class="example" style="height: 40%;"></div>'''


soup = BeautifulSoup(html, 'lxml')
goal = [x['style'].split()[1][:-1] for x in soup.select('.example')]
print(goal)

Output:

['50%', '20%', '40%']

Upvotes: 0

QHarr
QHarr

Reputation: 84465

You can combine class and attribute selectors to target elements, then extract the style attribute. Use re to extract the desired values:

from bs4 import BeautifulSoup as bs
import re

html = '''
<div class="example" style="height: 50%;"></div>
<div class="example" style="height: 20%;"></div>
<div class="example" style="height: 40%;"></div>
'''
soup = bs(html, 'lxml')

for i in soup.select('.example[style*=height]'):
    print(re.search(r'(\d+%)', i['style']).group(1))

Upvotes: 1

Related Questions