Reputation: 3501
I am using bs4
and want to extract a href
of a specified image.
For example in the html code I have:
<div style="text-align:center;"><a href="page/folder1/image.jpg" target="_blank"><img src="page_files/image.jpg" alt="Picture" border="0" width="150" height="150"></a></div>
</div>
And I have my image src
given (page_files/image.jpg
) and want to extract corresponding href
, so in this example it is: page/folder1/image.jpg
. I was trying to use find_previous method
, but I have a small problem to extract the href
content:
soup = bs4.BeautifulSoup(page)
for img in soup('img'):
imgLink = img.find_previous("a")
This returns the whole tag:
<a href="Here_is_link"><img alt="Tumblr" border="0" src="Here_is_source"/></a>
But I can't take the href
content, because when I try to make:
imgLink = img.find_previous("a")['href']
I have an error.
The same thing is when I try to use find_parent
like
imgLink = img.find_parent("a")['href']
How can I fix that? And what is better: find_previous()
or find_parent()
?
Upvotes: 2
Views: 1827
Reputation: 1122172
Make sure you are only looking for images that have a <a>
parent tag with href
attribute:
for img in soup.select('a[href] img'):
link = img.find_parent('a', href=True)
print link['href']
The CSS selector picks only images that have an <a href="...">
parent tag with an href
attribute. The find_parent()
search then again limits the search to those tags that have the attribute set.
If you are searching for all images, chances are you are finding some that have a <a>
tag parent or preceding tag that does not have the a href
attribute; <a>
tags can also be used for link targets with <a name="...">
, for example. If you are getting NoneType
attribute errors, that simply means there is no such parent tag for the given <img>
tag.
Upvotes: 4