Sajjad
Sajjad

Reputation: 893

Unable to parse Url with python urlparse

I am trying to write a small script that will take url as input and will parse it.

Following is my script

#! /usr/bin/env python

import sys

from urlparse import urlsplit
url = sys.argv[1]
parseUrl = urlsplit(url)
print 'scheme  :', parseUrl.scheme
print 'netloc  :', parseUrl.netloc

But when I execute this script with ./myscript http://www.example.com

it shows following error.

AttributeError: 'tuple' object has no attribute 'scheme'

I am new to python/scripting, where am I doing wrong?

Edit: Python version that I am using is Python 2.7.5

Upvotes: 0

Views: 2107

Answers (2)

spookylukey
spookylukey

Reputation: 6576

Looking at the docs, it sounds like you are using Python 2.4, which does not have the attributes added. The other answered missed off the critical bit from the docs:

New in version 2.2.

Changed in version 2.5: Added attributes to return value.

You will have to access the tuple parts by index or unpacking:

scheme, netloc, path, query, fragment = urlsplit(url)

However, you should really be upgrading to Python 2.7. Python 2.4 is no longer supported.

Upvotes: 0

Dair
Dair

Reputation: 16240

You don't want scheme. Instead in this case you want to access the 0 index of the tuple and the 1 index of the tuple.

print 'scheme  :', parseUrl[0]
print 'netloc  :', parseUrl[1]

urlparse uses the .scheme and .netloc notation, urlsplit instead uses a tuple (refer to the appropriate index number):

This is similar to urlparse(), but does not split the params from the URL. This should generally be used instead of urlparse() if the more recent URL syntax allowing parameters to be applied to each segment of the path portion of the URL (see RFC 2396) is wanted. A separate function is needed to separate the path segments and parameters. This function returns a 5-tuple: (addressing scheme, network location, path, query, fragment identifier).

The return value is actually an instance of a subclass of tuple. This class has the following additional read-only convenience attributes:

Attribute Index   Value                               Value if not present
scheme      0       URL scheme specifier                empty string
netloc      1       Network location part               empty string
path        2       Hierarchical path                   empty string
query       3       Query component                     empty string
fragment    4       Fragment identifier                 empty string
username            User name                           None
password            Password                            None
hostname            Host name (lower case)              None
port                Port number as integer, if present  None

Upvotes: 0

Related Questions