a1204773
a1204773

Reputation: 7043

Remove all <a> tags

I scraped one container which includes urls for example:

<a href="url">text</a>

I need all to be removed and only the text remain...

import urllib2, sys
from bs4 import BeautifulSoup

site = "http://mysite.com"
page = urllib2.urlopen(site)
soup = BeautifulSoup(page)

Is it possible?

Upvotes: 2

Views: 2507

Answers (2)

bossylobster
bossylobster

Reputation: 10163

soup = BeautifulSoup(page)
anchors = soup.findAll('a')
for anchor in anchors:
  anchor.replaceWithChildren()

Upvotes: 4

Jonathan Vanasco
Jonathan Vanasco

Reputation: 15680

You can do this with Bleach

PyPi - Bleach

>>> import bleach

>>> bleach.clean('an <script>evil()</script> example')
u'an &lt;script&gt;evil()&lt;/script&gt; example'

>>> bleach.linkify('an http://example.com url')
u'an <a href="http://example.com" rel="nofollow">http://example.com</a> url

>>> bleach.delinkify('a <a href="http://ex.mp">link</a>')
u'a link'

Upvotes: 6

Related Questions