Satyam Chaurasia
Satyam Chaurasia

Reputation: 5

Scrape data using beautiful soup

While Scraping data using beautifulSoap In this html code there are two <h2> tag but I want to extract data from second <h2> tag. So how can i do this? and so on if there are multiples of same tag and i want to extract data from any one of the tag how can I do that?

Code:

<h2>Video Instructions For Making Soft Idlis</h2>
<div class="embed-responsive embed-responsive-16by9">
<iframe class="embed-responsive-item" src="https://www.youtube.com/embed/p3uF3LK5734?rel=0" allowfullscreen="allowfullscreen"></iframe>
</div>

<h2>Recipe For Making Soft Idlis</h2>

I had thought of extracting data using keyword instead of using tag. for example I can use <h2> tag and use keyword Recipe to find the data of second <h2> tag

Upvotes: 0

Views: 200

Answers (2)

Keyur Potdar
Keyur Potdar

Reputation: 7248

for example I can use <h2> tag and use keyword Recipe to find the data of second <h2> tag

Yes, you can do exactly that. You can use the Python re (Regular Expression) module to match partial text inside a tag.

From the documentation:

If you pass in a regular expression object, Beautiful Soup will filter against that regular expression using its search() method.

Demo:

>>> import re
>>> from bs4 import BeautifulSoup
>>> 
>>> html = '''<h2>Video Instructions For Making Soft Idlis</h2>
    <div class="embed-responsive embed-responsive-16by9">
    <iframe class="embed-responsive-item" src="https://www.youtube.com/embed/p3uF3LK5734?rel=0" allowfullscreen="allowfullscreen"></iframe>
    </div>

    <h2>Recipe For Making Soft Idlis</h2>'''
>>>
>>> soup = BeautifulSoup(html, 'html.parser')
>>> tag = soup.find('h2', text=re.compile('Recipe'))
>>> tag
<h2>Recipe For Making Soft Idlis</h2>
>>> tag.text
'Recipe For Making Soft Idlis'

Upvotes: 0

Vin&#237;cius Figueiredo
Vin&#237;cius Figueiredo

Reputation: 6518

If you know what h2 you want based on order, you simply need to use that as an index to the return of .findAll method:

from bs4 import BeautifulSoup
soup = BeautifulSoup('''<h2>Video Instructions For Making Soft Idlis</h2>
<div class="embed-responsive embed-responsive-16by9">
<iframe class="embed-responsive-item" src="https://www.youtube.com/embed/p3uF3LK5734?rel=0" allowfullscreen="allowfullscreen"></iframe>
</div>

<h2>Recipe For Making Soft Idlis</h2>''', "html.parser")

>>> soup.findAll("h2")[1]
<h2>Recipe For Making Soft Idlis</h2>

Upvotes: 1

Related Questions