Andrew
Andrew

Reputation: 3999

How can I unshorten a URL?

I want to be able to take a shortened or non-shortened URL and return its un-shortened form. How can I make a python program to do this?

Additional Clarification:

e.g. bit.ly/silly in the input array should be google.com in the output array
e.g. google.com in the input array should be google.com in the output array

Upvotes: 23

Views: 31449

Answers (10)

AYUSH KUMAR JAISWAL
AYUSH KUMAR JAISWAL

Reputation: 9

This Is very easy task you just need to add 4 lines of codes thats it :)

import requests
url = input('Enter url : ')
site = requests.get(url)
print(site.url)

just run this code you will successfully unshort the url.

Upvotes: 0

Tiago Peres
Tiago Peres

Reputation: 15642

You can use geturl()

from urllib.request import urlopen
url = "bit.ly/silly"
unshortened_url = urlopen(url).geturl()
print(unshortened_url)
# google.com

Upvotes: 1

fmarm
fmarm

Reputation: 4284

If you are using Python 3.5+ you can use the Unshortenit module that makes this very easy:

from unshortenit import UnshortenIt
unshortener = UnshortenIt()
uri = unshortener.unshorten('https://href.li/?https://example.com')

Upvotes: 5

Daniel Cambría
Daniel Cambría

Reputation: 247

To unshort, you can use requests. This is a simple solution that works for me.

import requests
url = "http://foo.com"

site = requests.get(url)
print(site.url)

Upvotes: 4

user387049
user387049

Reputation: 6877

Unshorten.me has an api that lets you send a JSON or XML request and get the full URL returned.

Upvotes: 5

Amir Krifa
Amir Krifa

Reputation: 61

Here a src code that takes into account almost of the useful corner cases:

  • set a custom Timeout.
  • set a custom User Agent.
  • check whether we have to use an http or https connection.
  • resolve recursively the input url and prevent ending within a loop.

The src code is on github @ https://github.com/amirkrifa/UnShortenUrl

comments are welcome ...

import logging
logging.basicConfig(level=logging.DEBUG)

TIMEOUT = 10
class UnShortenUrl:
    def process(self, url, previous_url=None):
        logging.info('Init url: %s'%url)
        import urlparse
        import httplib
        try:
            parsed = urlparse.urlparse(url)
            if parsed.scheme == 'https':
                h = httplib.HTTPSConnection(parsed.netloc, timeout=TIMEOUT)
            else:
                h = httplib.HTTPConnection(parsed.netloc, timeout=TIMEOUT)
            resource = parsed.path
            if parsed.query != "": 
                resource += "?" + parsed.query
            try:
                h.request('HEAD', 
                          resource, 
                          headers={'User-Agent': 'curl/7.38.0'}
                                   }
                          )
                response = h.getresponse()
            except:
                import traceback
                traceback.print_exec()
                return url

            logging.info('Response status: %d'%response.status)
            if response.status/100 == 3 and response.getheader('Location'):
                red_url = response.getheader('Location')
                logging.info('Red, previous: %s, %s'%(red_url, previous_url))
                if red_url == previous_url:
                    return red_url
                return self.process(red_url, previous_url=url) 
            else:
                return url 
        except:
            import traceback
            traceback.print_exc()
            return None

Upvotes: 1

GermainZ
GermainZ

Reputation: 1943

Using requests:

import requests

session = requests.Session()  # so connections are recycled
resp = session.head(url, allow_redirects=True)
print(resp.url)

Upvotes: 34

DmitrySandalov
DmitrySandalov

Reputation: 4109

http://github.com/stef/urlclean

sudo pip install urlclean
urlclean.unshorten(url)

Upvotes: 1

Adam Rosenfield
Adam Rosenfield

Reputation: 400692

Send an HTTP HEAD request to the URL and look at the response code. If the code is 30x, look at the Location header to get the unshortened URL. Otherwise, if the code is 20x, then the URL is not redirected; you probably also want to handle error codes (4xx and 5xx) in some fashion. For example:

# This is for Py2k.  For Py3k, use http.client and urllib.parse instead, and
# use // instead of / for the division
import httplib
import urlparse

def unshorten_url(url):
    parsed = urlparse.urlparse(url)
    h = httplib.HTTPConnection(parsed.netloc)
    h.request('HEAD', parsed.path)
    response = h.getresponse()
    if response.status/100 == 3 and response.getheader('Location'):
        return response.getheader('Location')
    else:
        return url

Upvotes: 40

hughdbrown
hughdbrown

Reputation: 49063

Open the url and see what it resolves to:

>>> import urllib2
>>> a = urllib2.urlopen('http://bit.ly/cXEInp')
>>> print a.url
http://www.flickr.com/photos/26432908@N00/346615997/sizes/l/
>>> a = urllib2.urlopen('http://google.com')
>>> print a.url
http://www.google.com/

Upvotes: 4

Related Questions