user2784753
user2784753

Reputation: 229

How to scrape the "right" photos from webpage?

Scrape the right photo from website: I'm making a simple news app. I have the article, but i need to select the right photo.

for example, in:

http://www.politico.com/story/2013/09/government-shutdown-2013-gop-narrative-97521.html

I want to scrape the url for the photo of the 3 people. However there are several images to scrape. How do i know which is the right photo. What logic do news.google and flipboard do to scrape the 'right' photo from an article or any article.

I've noticed that most of the time these photos are in a slideshow. How can i scrape photos of these slideshow using Beautiful Soup.

Upvotes: 0

Views: 1114

Answers (1)

David Robinson
David Robinson

Reputation: 78610

That page has a meta tag fitting under the open graph protocol:

<meta property="og:image" content="http://images.politico.com/global/2013/09/29/mccarthy_blackburn_cruz_ap_ftn_ap_328.jpg"/> 

That gives the image that the site's creators suggest be used as a preview (which is indeed the picture of the three people).

You could get the address of this image using BeautifulSoup like so:

import urllib2
from bs4 import BeautifulSoup

url = "http://www.politico.com/story/2013/09/government-shutdown-2013-gop-narrative-97521.html"
bs = BeautifulSoup(urllib2.urlopen(url))

metatag = bs.find("meta", {"property": "og:image"})
if metatag is not None:
    print metatag["content"]
else:
    print "This page has no Open Graph meta image tag"

Upvotes: 4

Related Questions