Reputation: 77
I've looked through other posts and have tried to implement what they have said into my code but I'm still missing something.
What I am trying to do is get all the image links off a website, specifically reddit.com and once I obtain the links to display the images in my browser or download them and display them through Windows Image Viewer. I am just trying to practice and broaden my python skills.
I am stuck at obtaining the links and choosing how to display the images. What I have right now is:
import urllib2
import re
links=urllib2.urlopen("http://www.reddit.com").read()
found=re.findall("http://imgur.com/+\w+.jpg", links)
print found #Just for testing purposes, to see what links are found
Thanks for the help.
Upvotes: 2
Views: 1357
Reputation: 1122152
The imgur.com
links on reddit do not have any .jpg
extensions, so your regular expression won't match anything. You should be looking for the i.imgur.com
domain instead.
Matching re.findall("http://i.imgur.com/\w+.jpg", links)
does return results:
>>> re.findall("http://i.imgur.com/\w+.jpg", links)
['http://i.imgur.com/PMNZ2.jpg', 'http://i.imgur.com/akg4l.jpg', 'http://i.imgur.com/dAHtq.jpg', 'http://i.imgur.com/dAHtq.jpg', 'http://i.imgur.com/nT73r.jpg', 'http://i.imgur.com/nT73r.jpg', 'http://i.imgur.com/z2wIl.jpg', 'http://i.imgur.com/z2wIl.jpg']
You can expand this to other file extensions:
>>> re.findall("http://i.imgur.com/\w+.(?:jpg|gif|png)", links)
['http://i.imgur.com/PMNZ2.jpg', 'http://i.imgur.com/akg4l.jpg', 'http://i.imgur.com/dAHtq.jpg', 'http://i.imgur.com/dAHtq.jpg', 'http://i.imgur.com/rsIfN.png', 'http://i.imgur.com/rsIfN.png', 'http://i.imgur.com/nT73r.jpg', 'http://i.imgur.com/nT73r.jpg', 'http://i.imgur.com/bPs5N.gif', 'http://i.imgur.com/z2wIl.jpg', 'http://i.imgur.com/z2wIl.jpg']
You may want to use a proper HTML parser instead of a regular expression; I can recommend both BeautifulSoup and lxml
. It'll make it much easier to find all <img />
tags that use i.imgur.com
links with those tools, including .gif
and .png
files, if any.
Upvotes: 3