How to remove a href tags from a string?

Question

I have some user reviews which was previously scraped from a website and I am trying to clean up the text to do some text analysis. There are several a href tags in the text that I would like to remove. For example, see a portion of text contained in a paragraph:

'We had a


I would like to remove this portion from the string:

I am not an expert on regex, so the best I could do so far is:
import re
re.sub(r'

But this removes only part of what I want to get rid off as shown below:
print(mytext)
'We had a  target="_blank" rel="nofollow">restaurants.com

I searched a lot for a solution but could only find one for javascript and several posts that warn against using regex for parsing html, which I guess does not apply to my case as I am processing a string. I guess if I read more about using regex, I can get this done, but I am looking for a quick solution. Really appreciate any help.

AlecZ · Accepted Answer

import re
''.join(re.findall('(


That'll work for your example, if you have multiple href links you could use:
[''.join(entry) for entry in re.findall('(

How to remove a href tags from a string?

Answers (2)

Related Questions