user1943221
user1943221

Reputation: 77

Trying to get all image links of reddit.com using python and re

I've looked through other posts and have tried to implement what they have said into my code but I'm still missing something.

What I am trying to do is get all the image links off a website, specifically reddit.com and once I obtain the links to display the images in my browser or download them and display them through Windows Image Viewer. I am just trying to practice and broaden my python skills.

I am stuck at obtaining the links and choosing how to display the images. What I have right now is:

import urllib2
import re
links=urllib2.urlopen("http://www.reddit.com").read()
found=re.findall("http://imgur.com/+\w+.jpg", links)
print found #Just for testing purposes, to see what links are found

Thanks for the help.

Upvotes: 2

Views: 1357

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1122152

The imgur.com links on reddit do not have any .jpg extensions, so your regular expression won't match anything. You should be looking for the i.imgur.com domain instead.

Matching re.findall("http://i.imgur.com/\w+.jpg", links) does return results:

>>> re.findall("http://i.imgur.com/\w+.jpg", links)
['http://i.imgur.com/PMNZ2.jpg', 'http://i.imgur.com/akg4l.jpg', 'http://i.imgur.com/dAHtq.jpg', 'http://i.imgur.com/dAHtq.jpg', 'http://i.imgur.com/nT73r.jpg', 'http://i.imgur.com/nT73r.jpg', 'http://i.imgur.com/z2wIl.jpg', 'http://i.imgur.com/z2wIl.jpg']

You can expand this to other file extensions:

>>> re.findall("http://i.imgur.com/\w+.(?:jpg|gif|png)", links)
['http://i.imgur.com/PMNZ2.jpg', 'http://i.imgur.com/akg4l.jpg', 'http://i.imgur.com/dAHtq.jpg', 'http://i.imgur.com/dAHtq.jpg', 'http://i.imgur.com/rsIfN.png', 'http://i.imgur.com/rsIfN.png', 'http://i.imgur.com/nT73r.jpg', 'http://i.imgur.com/nT73r.jpg', 'http://i.imgur.com/bPs5N.gif', 'http://i.imgur.com/z2wIl.jpg', 'http://i.imgur.com/z2wIl.jpg']

You may want to use a proper HTML parser instead of a regular expression; I can recommend both BeautifulSoup and lxml. It'll make it much easier to find all <img /> tags that use i.imgur.com links with those tools, including .gif and .png files, if any.

Upvotes: 3

Related Questions