Reputation: 870
I'm looking for a way to extract all main images of a web page. the easy way is to do it with lxml
import lxml.html
import requests
html = requests.get('https://fr.wikipedia.org/wiki/Image').text()
tree = lxml.html.fromstring(html)
img = tree.xpath('//img[@src]']
this way we get all images, including logos, icons, pictos, sprite css...etc what I would like to get is only real images that are in the content. Any ideas? Thanks
Upvotes: 0
Views: 26