dhvlnyk
dhvlnyk

Reputation: 307

regex to remove hyperlinks

Input:
source http://www.emaxhealth.com/1275/misdiagnosing from here http://www.cancerresearchuk.org/about-cancer/type recounting her experiences and thoughts blog http://fty720.blogspot.com even carried the new name. She was far from home.


From the about input I want to remove the hyperlinks. Below is the regex that I am trying

http://[\w|\W|\d|\s]*(?=[ ])

This regex will encompass all characters,digits and whitespaces after encountering the word 'http' and will continue till first blank space. Unfortunately, it is not working as expected. Please do help me find out my error.Thanks

Upvotes: 1

Views: 204

Answers (2)

Andie2302
Andie2302

Reputation: 4887

To find the hyperlink use:

\b(https?)://[A-Z0-9+&@#/%?=~_|$!:,.;-]*[A-Z0-9+&@#/%=~_|$]

or:


If you want to find the html a tag use:

<a\b[^>]*>(.*?)</a>

Upvotes: 1

Kalanidhi
Kalanidhi

Reputation: 5092

Try this sed command

sed  's/http[^ ]\+//g' FileName

Output :

source from here recounting her experiences and thoughts blog even carried the new name. She was far from home.

Upvotes: 1

Related Questions