Ossama
Ossama

Reputation: 2433

Strip link from website using Beautifulsoup

I have tried hard to get the link (i.e. /d/Hinchinbrook+25691+Masjid-Bilal) from "result" below while using beautifulsoup in Python. Please help?

result:

<div class="subtitleLink"><a href="/d/Hinchinbrook+25691+Masjid-Bilal"><b>Masjid Bilal</b></a></div>

code:

url1 = "http://www.salatomatic.com/c/Sydney+168"
content1 = urllib2.urlopen(url1).read()
soup = BeautifulSoup(content1)
results = soup.findAll("div", {"class" : "subtitleLink"})
for result in results :
print result
br = result.find('a')
pos = br.get_text()
print pos

Upvotes: 0

Views: 216

Answers (2)

jonafato
jonafato

Reputation: 1605

The get_text method returns only the string components of a tag. To get the link here, reference it as an attribute. For this specific instance, you can change br.get_text() to br['href'] to get your desired result.

...
>>> br = result.find('a')
>>> pos = br['href']
>>> print pos
/d/Hinchinbrook+25691+Masjid-Bilal

Upvotes: 2

Matt
Matt

Reputation: 3557

import urllib2
from bs4 import BeautifulSoup

url1 = "http://www.salatomatic.com/c/Sydney+168"
content1 = urllib2.urlopen(url1).read()
soup = BeautifulSoup(content1)
for link in soup.findAll('a'):
   print link.get('href')

This should work if you want all the links. Let me know if it doesn't.

Upvotes: 2

Related Questions