Removing '#' from the scraped links

Question

Hi I am beginner with web scraping. I am trying to scrape all the links from a website and I am successful to some extent.

import requests
from bs4 import BeautifulSoup

url = 'https://www.marian.ac.in/'

response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')

soup.title
soup.title.string

for link in soup.find_all('a',href=True):
    print(link['href'])

The issue I am facing is the output has '#'.How shall I remove this?

Can anyone help with this?

Roy · Accepted Answer

The # entries you are getting are actually from some href entries. Screenshot attached from the website. We can simply filter them out by adding an if condition inside for loop like this.

for link in soup.find_all('a', href=True):
    if not link['href'].strip() == "#":
        print(link['href'])

This will return few non url entries like "javascript:void(0);" or "semester-register-login" as well. If We don't want those entries as well we need to modify the condition.

Removing '#' from the scraped links

Answers (2)

Related Questions

Removing &#39;#&#39; from the scraped links

Answers (2)

Related Questions

Removing '#' from the scraped links