Justin
Justin

Reputation: 86779

Simple python / Beautiful Soup type question

I'm trying to do some simple string manipulation with the href attribute of a hyperlink extracted using Beautiful Soup:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('<a href="http://www.some-site.com/">Some Hyperlink</a>')
href = soup.find("a")["href"]
print href
print href[href.indexOf('/'):]

All I get is:

Traceback (most recent call last):
  File "test.py", line 5, in <module>
    print href[href.indexOf('/'):]
AttributeError: 'unicode' object has no attribute 'indexOf'

How should I convert whatever href is into a normal string?

Upvotes: 5

Views: 5575

Answers (3)

hughdbrown
hughdbrown

Reputation: 49033

You mean find(), not indexOf().

Python docs on strings.

Upvotes: 0

codeape
codeape

Reputation: 100826

Python strings do not have an indexOf method.

Use href.index('/')

href.find('/') is similar. But find returns -1 if the string is not found, while index raises a ValueError.

So the correct thing is to use index (since '...'[-1] will return the last character of the string).

Upvotes: 10

Marius
Marius

Reputation: 3617

href is a unicode string. If you need the regular string, then use

regular_string = str(href)

Upvotes: 0

Related Questions