LonelySoul
LonelySoul

Reputation: 1232

Fetching Image from URL using BeautifulSoup

I am trying to fetch important images and not thumbnail or other gifs from the Wikipedia page and using following code. However the "img" is coming as length of "0". any suggestion on how to rectify it.

Code :

import urllib
import urllib2
from bs4 import BeautifulSoup
import os

html = urllib2.urlopen("http://en.wikipedia.org/wiki/Main_Page")

soup = BeautifulSoup(html)

imgs = soup.findAll("div",{"class":"image"})

Also if someone can explain in detail that how to use the findAll by looking at "source element" in webpage. That will be awesome.

Upvotes: 0

Views: 7910

Answers (2)

alecxe
alecxe

Reputation: 473863

The a tags on the page have an image class, not div:

>>> img_links = soup.findAll("a", {"class":"image"})
>>> for img_link in img_links:
...     print img_link.img['src']
... 
//upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Stora_Kronan.jpeg/100px-Stora_Kronan.jpeg
//upload.wikimedia.org/wikipedia/commons/thumb/4/4b/Christuss%C3%A4ule_8.jpg/77px-Christuss%C3%A4ule_8.jpg
...

Or, even better, use a.image > img CSS selector:

>>> for img in soup.select('a.image > img'):
...      print img['src']
//upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Stora_Kronan.jpeg/100px-Stora_Kronan.jpeg
//upload.wikimedia.org/wikipedia/commons/thumb/4/4b/Christuss%C3%A4ule_8.jpg/77px-Christuss%C3%A4ule_8.jpg 
...

UPD (downloading images using urllib.urlretrieve):

from urllib import urlretrieve
import urlparse
from bs4 import BeautifulSoup
import urllib2

url = "http://en.wikipedia.org/wiki/Main_Page"
soup = BeautifulSoup(urllib2.urlopen(url))
for img in soup.select('a.image > img'):
    img_url = urlparse.urljoin(url, img['src'])
    file_name = img['src'].split('/')[-1]
    urlretrieve(img_url, file_name)

Upvotes: 6

Jay
Jay

Reputation: 9582

I don't see any div tags with a class called 'image' on that page.

You could get all the image tags and throw away ones that are small.

imgs = soup.select('img')

Upvotes: 1

Related Questions