Cantfindname
Cantfindname

Reputation: 2148

urllib2.urlopen() does not return the same page as chrome

I am trying to make a small program that downloads subtitles for movie files.

I noticed however that when I follow a link in chrome and when opening it with urllib2.urlopen() does not give the same results.

As an example let's consider the link http://www.opensubtitles.org/en/subtitleserve/sub/5523343 . In chrome this redirects to http://osdownloader.org/en/osdownloader.subtitles-for.you/subtitles/5523343 which after a little while downloads the file I want.

However, when I use the following code in python, I get redirected to another page:

import urllib2
url = "http://www.opensubtitles.org/en/subtitleserve/sub/5523343"
response = urllib2.urlopen(url)

if response.url == url:
  print "No redirect"
else: 
  print url, " --> ", response.url

Result: http://www.opensubtitles.org/en/subtitleserve/sub/5523343 --> http://www.opensubtitles.org/en/subtitles/5523343/the-musketeers-commodities-en

Why does that happen? How can I follow the same redirect as with the browser?

(I know that these sites offer APIs in python, but this is meant as practice in python and playing with urllib2 for the first time)

Upvotes: 2

Views: 758

Answers (1)

Niklas9
Niklas9

Reputation: 9396

There's a significant difference in the request you're making from Chrome and your script using urllib2 above, and that is the HTTP header User-Agent (https://en.wikipedia.org/wiki/User_agent).

opensubtitles.org probably identifies that you're trying to programmatically retrieving the webpage, and are blocking it. Try to use one of the User-Agent strings from Chrome (more here http://www.useragentstring.com/pages/Chrome/):

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36

in your script.

See this question on how to edit your script to support a custom User-Agent header - Changing user agent on urllib2.urlopen.

I would also like to recommend using the requests library for Python instead of urllib2, as the API is much easier to understand - http://docs.python-requests.org/en/latest/.

Upvotes: 2

Related Questions