Sainita
Sainita

Reputation: 362

How to strip this link to remove unwanted data (bs4)?

This is what the HTML looks like:

<div class="full-news none">
     Demo: <a href="https://www.lolinez.com/?https://www.makemytrip.com" 
    rel="external noopener noreferrer" target="_blank">https://www.makemytrip.com</a>
   <br/>

How can I remove this part from the href: https://www.lolinez.com/?, so that the final output becomes like this:

 <div class="full-news none">
         Demo: <a href="https://www.makemytrip.com" 
        rel="external noopener noreferrer" target="_blank">https://www.makemytrip.com</a>
       <br/>

I have tried using the decompose function of beautiful soup, but it completely removes the entire tag, How can this be fixed?

Upvotes: 0

Views: 212

Answers (1)

HedgeHog
HedgeHog

Reputation: 25196

Note Without additional context I would narrow down to following approaches

Option#1

Replace your substring the string that you pass to BeautifulSoup constructor:

soup = BeautifulSoup(YOUR_STRING.replace('https://www.lolinez.com/?',''), 'lxml')
Option#2

Replace the substring in your soup you can select all the <a> that contains www.lolinez.com and replace the value of its href:

for x in soup.select('a[href*="www.lolinez.com"]'):
    x['href'] = x['href'].replace('https://www.lolinez.com/?','')

Example

import bs4, requests
from bs4 import BeautifulSoup

html='''
<a href="https://www.lolinez.com/?https://www.makemytrip.com" rel="external noopener noreferrer" target="_blank">https://www.makemytrip.com</a>
<a href="https://www.makemytrip.com" rel="external noopener noreferrer" target="_blank">https://www.makemytrip.com</a>
<a href="https://www.lolinez.com/?https://www.makemytrip.com" rel="external noopener noreferrer" target="_blank">https://www.makemytrip.com</a>
'''

soup = BeautifulSoup(html, 'lxml')

for x in soup.select('a[href*="www.lolinez.com"]'):
    x['href'] = x['href'].replace('https://www.lolinez.com/?','')
    
soup

Output

<html><body><a href="https://www.makemytrip.com" rel="external noopener noreferrer" target="_blank">https://www.makemytrip.com</a><a href="https://www.makemytrip.com" rel="external noopener noreferrer" target="_blank">https://www.makemytrip.com</a><a href="https://www.makemytrip.com" rel="external noopener noreferrer" target="_blank">https://www.makemytrip.com</a></body></html>

Upvotes: 2

Related Questions