Amit Kulkarni
Amit Kulkarni

Reputation: 25

Fast expansion of shortened URLs using python

I am writing Python code to expand shortened URLs fetched from Twitter. I have fetched all the URLs and stored them in a text file separated by a newline.

Currently I am using:

response = urllib2.urlopen(url)
return response.url

to expand them.

But the urlopen() method doesn't seem to be very fast in expanding the URLs.

I have around 5.4 million URLs. Is there any faster way to expand them using Python?

Upvotes: 2

Views: 823

Answers (1)

Oliver Dain
Oliver Dain

Reputation: 9953

I suspect the issue is that network calls are slow and urllib blocks until it gets a response. So, for example, say it takes 200ms to get a response from the URL shortening service, then you'll only be able to resolve 5 URLs/second using urllib. However, if you use an async library you should be able to send out lots of requests before you get the first answer. Responses are then processed as they arrive back to your code. This should dramatically increase your throughput. There's a few Python libs for this kind of thing (Twisted, gevent, etc.) so you might just want to Google for "Python async rest".

You could also try to do this with lots of threads (I think urllib2 will release the GIL while it waits for a response, but not sure). That wouldn't be as fast as async, but should still speed things up quite a bit.

Both of these solutions introduce quite a bit of complexity, but if you want to go fast...

Upvotes: 5

Related Questions