python how to get all images which are not part of the template

Question

I'm looking for a way to extract all main images of a web page. the easy way is to do it with lxml

import lxml.html
import requests
html = requests.get('https://fr.wikipedia.org/wiki/Image').text()

tree = lxml.html.fromstring(html)
img = tree.xpath('//img[@src]']

this way we get all images, including logos, icons, pictos, sprite css...etc what I would like to get is only real images that are in the content. Any ideas? Thanks

python how to get all images which are not part of the template

Answers (1)

Related Questions