evi
evi

Reputation: 135

BeautifulSoup: how to select certain tag

I am confused with how beautiful soup works, when you want to crab a child of a tag. So, I have the following HTML code

<div class="media item avatar profile">
<a href="http://..." class="media-link action-medialink">
<img class="media-item-img" src="http://...jpeg" alt="name" title="name" width="150" height="200">
</a>
</div>    

I want to grab the src tag. I am using the following code:

soup = BeautifulSoup(file_)
for x in soup.find('div', attrs={'class':'media item avatar profile'}).findNext('img'):
    print x 

This prints the whole img tag. How do i select only the src ?

Thank you.

Upvotes: 5

Views: 20800

Answers (3)

root
root

Reputation: 80436

I think you would want something like:

soup.find('div', attrs={'class':'media item avatar profile'}).a.img['src']

In [1]: from bs4 import BeautifulSoup

In [2]: html = """\
   ...: <div class="media item avatar profile">
   ...: <a href="http://..." class="media-link action-medialink">
   ...: <img class="media-item-img" src="http://...jpeg" alt="name" title="name" width="150" height="200">
   ...: </a>
   ...: </div>"""

In [3]: soup = BeautifulSoup(html)

In [4]: soup.find('div', attrs={'class':'media item avatar profile'}).a.img['src']
Out[4]: 'http://...jpeg'

Upvotes: 3

Martijn Pieters
Martijn Pieters

Reputation: 1124758

src is an attribute of the tag. Once you have the tag, access the attributes as you would dictionary keys; you only found the a tag so you need to navigate to the contained img tag too:

for x in soup.find_all('div', attrs={'class':'media item avatar profile'}):
    print x.a.img['src']

Your code used findNext() which returns a tag object; looping over that gives you the children, so x was the img object. I changed this to be a bit more direct and clearer. x is now the div, and we navigate directly to the first a and contained img tag.

Upvotes: 6

unutbu
unutbu

Reputation: 880777

findNext returns the first item that matches the given criteria and appears after the given tag in the document. Note that this means any tag it returns is not guaranteed to be a child of the given tag (e.g. a child of the div tag.)

Use findChildren to restrict to children of the given tag:

import BeautifulSoup as bs

file_ = '''<html>
<div class="media item avatar profile">
<a href="http://..." class="media-link action-medialink">
<img class="media-item-img" src="http://...jpeg" alt="name" title="name" width="150" height="200">
</a>
</div>  
</html>
'''
soup = bs.BeautifulSoup(file_)
for x in soup.find(
        'div', attrs={'class':'media item avatar profile'}).findChildren('img'):
    print(x['src'])

yields

http://...jpeg

Upvotes: 0

Related Questions