Reputation: 7100
I am grabbing a bunch of html from a service and parsing it slightly. I am looking for a way to grab the link from the first image tag.
Something similar like this JQuery code:
var imagelink = $('img:first', feed.content).attr('src');
But of course using only Python/Django (server runs on Google app engine). I rather not use any other libraries, just to grab a simple link.
Upvotes: 3
Views: 2692
Reputation: 291
This is exactly what I'm looking for. Actually, the real code is like this:
tree = BeautifulSoup(raw_html)
img_link = tree.find_all('img')[0].get('src')
Works great! thanks timmy-omahony
Upvotes: 3
Reputation: 7100
If I do any more parsing of html I probably will look into one of the libraries suggested. But for now I have solved this by:
startImgPos = post.find('<img', 0, len(post)) + 4
if(startImgPos > -1):
endImgPos = post.find('>', startImgPos, len(post))
imageTag = post[startImgPos:endImgPos]
startSrcPos = imageTag.find('src="', 0, len(post)) +5
endSrcPos = imageTag.find('"', startSrcPos , len(post))
linkTag = imageTag[startSrcPos:endSrcPos]
r['linktag'] = linkTag
I'll improve this later, but for now it does the trick. Feel free to suggest any more ideas/improvements to the above code.
Upvotes: 0
Reputation: 53998
You can use BeautifulSoup to do this:
http://www.crummy.com/software/BeautifulSoup/
It's a XML/HTML parser. So you pass in the raw html, and then you can search it for particular tags/attrs etc.
something like this should work:
tree = BeautifulSoup(raw_html)
img_link = (tree.find('img')[0]).attr['src']
Upvotes: 9