Reputation: 27
Hello this is a dummy code but I want to get the Doodle text from the second <a>
:
<div class="test">
<a href="www.google.com"></a>
<a href="https://www.google.com/doodles"> Doodle </a>
</div>
These are my failed codes:
soup.find('div', {'class' : 'test'}) #1
soup.find('div', {'class' : 'test'}).next_sibling #2
Upvotes: 1
Views: 102
Reputation: 46
Check this question: beautifulsoup - extracting link within a div
And try this:
for div in soup.find_all('div', {'class': 'test'}):
a = div.find_all('a')[1]
print(a.text.strip())
Upvotes: 1
Reputation: 25196
There are a lot of ways to get your goal, essential is as always the pattern or structure you have to work with.
Assuming that the <a>
is still the second, you could use css selectors
like :nth-of-type(2)
:
soup.select_one('div.test a:nth-of-type(2)').get_text(strip=True)
#Doodle
Assuming it is always the last one you could also use the index of your ResultSet
:
soup.find('div', {'class' : 'test'}).find_all('a')[-1].get_text(strip=True)
#Doodle
or again alternative with css selectors
for last one:
soup.select_one('div.test a:last-of-type').get_text(strip=True)
#Doodle
soup.select('div.test a')[-1].get_text(strip=True)
#Doodle
from bs4 import BeautifulSoup
html = '''
<div class="test">
<a href="www.google.com"></a>
<a href="https://www.google.com/doodles"> Doodle </a>
</div>
'''
soup = BeautifulSoup(html)
print(soup.select_one('div.test a:nth-of-type(2)').get_text(strip=True))
print(soup.find('div', {'class' : 'test'}).find_all('a')[-1].get_text(strip=True))
Doodle
Upvotes: 1
Reputation: 517
doodletext = soup.find('div', {'class' : 'test'})
print(doodletext.text)
This will work only for this example. If you need to find Doodle
and there are other text around, you may need to use the split()
function to drill down the specific string of text you are looking for.
Upvotes: 1