Reputation: 4999
Can someone help me parse a html file to get the links for all the images in the file in python?
Preferably with out a 3rd party module...
Thanks!
Upvotes: 10
Views: 18311
Reputation: 73292
You can use Beautiful Soup. I know you said without a 3rd party module. However, this is an ideal tool for parsing HTML.
import urllib2
from BeautifulSoup import BeautifulSoup
page = BeautifulSoup(urllib2.urlopen("http://www.url.com"))
page.findAll('img')
Upvotes: 11
Reputation: 633
It's generally accepted that lxml is faster than Beautiful Soup (ref). Its tutorial can be found here: (link) You may also take a look at this old stackoverflow post.
Upvotes: 2
Reputation: 10663
only using PSL
from html.parser import HTMLParser
class MyParse(HTMLParser):
def handle_starttag(self, tag, attrs):
if tag=="img":
print(dict(attrs)["src"])
h=MyParse()
page=open("index.html").read()
h.feed(page)
Upvotes: 11