Ioannis Petridis
Ioannis Petridis

Reputation: 147

Removing <a href="#" tags python

arg = urllib2.urlopen(argv[1]).read()
soup = BeautifulSoup(arg)
a_tags = soup.find_all('a') #so this stores a list with all the <a href="" /a> tags

and i need ONLY those that DO NOT LINK TO SAME PAGE (without the symbol # in href)

anyone pls....

Upvotes: 0

Views: 260

Answers (1)

Blender
Blender

Reputation: 298364

You can match the href attribute with a function:

for a in soup.find_all('a', href=lambda value: value.startswith('#')):
    a.extract()

Upvotes: 2

Related Questions