Python regex not matching images on website (it matches in regex helper)

Question

I do not understand what is wrong with my script below.

It is supposed to parse out images using regex. I've verified that my regex is correct by using http://regex101.com/.

The problem is it doesn't even grab the first image on the website (even it should?).

The website in the script is a NSFW blog. Please don't go to the link if you are offended by nudity or sexuality.

from urllib2 import urlopen
import re

base = "http://bassrx.tumblr.com"
url = "http://bassrx.tumblr.com/tagged/tt"

def parse_page(url):
# returns html for parsing
    page = urlopen(url)
    html = page.read()
    return html

def get_links(html):
# returns list of all image urls on page
    jpgs = re.findall("src.\"(.*?500.jpg)", html, re.IGNORECASE)
    #pngs = re.findall("src.\"(.*?media.tumblr.*?tumblr_.*?png)", html, re.IGNORECASE)
    #links = jpgs + pngs
    return jpgs


html = parse_page(url)      # get the html for first page
links = get_links(html)     # get all relevant image links
print links

The very first image has the following HTML:

src="http://37.media.tumblr.com/tumblr_m9q9feJcxl1qi02clo3_500.jpg" alt="">

I would like to know why it doesn't grab this image (and also misses most of the others).

Python regex not matching images on website (it matches in regex helper)

Answers (1)

Related Questions