zeller
zeller

Reputation: 4974

urllib.urlopen returns an old page?

So I have a very simple HTML page (a dir listing) and I try to read it with urllib, this way:

page =  urllib.urlopen(coreRepositoryUrl).read()

The problem is, that the HTML I read this way is older than the newest. info() returns me this:

Date: Fri, 19 Apr 2013 18:48:09 GMT
Server: Apache/2.0.52 (Fedora)
Content-Type: text/html; charset=UTF-8
Connection: close
Age: 481084

And the page was last updated today (2013-04-25). Which component might be the one that caches?

Upvotes: 4

Views: 1973

Answers (1)

acj
acj

Reputation: 134

Add the header "Cache-Control" with value "max-age=0" in your request

import urllib2
req = urllib2.Request(url)
req.add_header('Cache-Control', 'max-age=0')
resp = urllib2.urlopen(req)
content = resp.read()

Using that header each cache along the way will revalidate its cache entry

Upvotes: 3

Related Questions