Reputation: 135
I am confused with how beautiful soup works, when you want to crab a child of a tag. So, I have the following HTML code
<div class="media item avatar profile">
<a href="http://..." class="media-link action-medialink">
<img class="media-item-img" src="http://...jpeg" alt="name" title="name" width="150" height="200">
</a>
</div>
I want to grab the src tag. I am using the following code:
soup = BeautifulSoup(file_)
for x in soup.find('div', attrs={'class':'media item avatar profile'}).findNext('img'):
print x
This prints the whole img tag. How do i select only the src ?
Thank you.
Upvotes: 5
Views: 20800
Reputation: 80436
I think you would want something like:
soup.find('div', attrs={'class':'media item avatar profile'}).a.img['src']
In [1]: from bs4 import BeautifulSoup
In [2]: html = """\
...: <div class="media item avatar profile">
...: <a href="http://..." class="media-link action-medialink">
...: <img class="media-item-img" src="http://...jpeg" alt="name" title="name" width="150" height="200">
...: </a>
...: </div>"""
In [3]: soup = BeautifulSoup(html)
In [4]: soup.find('div', attrs={'class':'media item avatar profile'}).a.img['src']
Out[4]: 'http://...jpeg'
Upvotes: 3
Reputation: 1124758
src
is an attribute of the tag. Once you have the tag, access the attributes as you would dictionary keys; you only found the a
tag so you need to navigate to the contained img
tag too:
for x in soup.find_all('div', attrs={'class':'media item avatar profile'}):
print x.a.img['src']
Your code used findNext()
which returns a tag object; looping over that gives you the children, so x
was the img
object. I changed this to be a bit more direct and clearer. x
is now the div
, and we navigate directly to the first a
and contained img
tag.
Upvotes: 6
Reputation: 880777
findNext
returns the first item that matches the given criteria and appears after the given tag in the document. Note that this means any tag it returns is not guaranteed to be a child of the given tag (e.g. a child of the div
tag.)
Use findChildren
to restrict to children of the given tag:
import BeautifulSoup as bs
file_ = '''<html>
<div class="media item avatar profile">
<a href="http://..." class="media-link action-medialink">
<img class="media-item-img" src="http://...jpeg" alt="name" title="name" width="150" height="200">
</a>
</div>
</html>
'''
soup = bs.BeautifulSoup(file_)
for x in soup.find(
'div', attrs={'class':'media item avatar profile'}).findChildren('img'):
print(x['src'])
yields
http://...jpeg
Upvotes: 0