Reputation: 5
While Scraping data using beautifulSoap
In this html code there are two <h2>
tag but I want to extract data from second <h2>
tag. So how can i do this?
and so on if there are multiples of same tag and i want to extract data from any one of the tag how can I do that?
Code:
<h2>Video Instructions For Making Soft Idlis</h2>
<div class="embed-responsive embed-responsive-16by9">
<iframe class="embed-responsive-item" src="https://www.youtube.com/embed/p3uF3LK5734?rel=0" allowfullscreen="allowfullscreen"></iframe>
</div>
<h2>Recipe For Making Soft Idlis</h2>
I had thought of extracting data using keyword instead of using tag.
for example I can use <h2>
tag and use keyword Recipe
to find the data of second <h2>
tag
Upvotes: 0
Views: 200
Reputation: 7248
for example I can use
<h2>
tag and use keywordRecipe
to find the data of second<h2>
tag
Yes, you can do exactly that. You can use the Python re
(Regular Expression) module to match partial text inside a tag.
From the documentation:
If you pass in a regular expression object, Beautiful Soup will filter against that regular expression using its
search()
method.
Demo:
>>> import re
>>> from bs4 import BeautifulSoup
>>>
>>> html = '''<h2>Video Instructions For Making Soft Idlis</h2>
<div class="embed-responsive embed-responsive-16by9">
<iframe class="embed-responsive-item" src="https://www.youtube.com/embed/p3uF3LK5734?rel=0" allowfullscreen="allowfullscreen"></iframe>
</div>
<h2>Recipe For Making Soft Idlis</h2>'''
>>>
>>> soup = BeautifulSoup(html, 'html.parser')
>>> tag = soup.find('h2', text=re.compile('Recipe'))
>>> tag
<h2>Recipe For Making Soft Idlis</h2>
>>> tag.text
'Recipe For Making Soft Idlis'
Upvotes: 0
Reputation: 6518
If you know what h2
you want based on order, you simply need to use that as an index to the return of .findAll
method:
from bs4 import BeautifulSoup
soup = BeautifulSoup('''<h2>Video Instructions For Making Soft Idlis</h2>
<div class="embed-responsive embed-responsive-16by9">
<iframe class="embed-responsive-item" src="https://www.youtube.com/embed/p3uF3LK5734?rel=0" allowfullscreen="allowfullscreen"></iframe>
</div>
<h2>Recipe For Making Soft Idlis</h2>''', "html.parser")
>>> soup.findAll("h2")[1]
<h2>Recipe For Making Soft Idlis</h2>
Upvotes: 1