Reputation: 307
Input:
source http://www.emaxhealth.com/1275/misdiagnosing from here http://www.cancerresearchuk.org/about-cancer/type recounting her experiences and thoughts blog http://fty720.blogspot.com even carried the new name. She was far from home.
From the about input I want to remove the hyperlinks. Below is the regex that I am trying
http://[\w|\W|\d|\s]*(?=[ ])
This regex will encompass all characters,digits and whitespaces after encountering the word 'http' and will continue till first blank space. Unfortunately, it is not working as expected. Please do help me find out my error.Thanks
Upvotes: 1
Views: 204
Reputation: 4887
To find the hyperlink use:
\b(https?)://[A-Z0-9+&@#/%?=~_|$!:,.;-]*[A-Z0-9+&@#/%=~_|$]
or:
If you want to find the html a tag use:
<a\b[^>]*>(.*?)</a>
Upvotes: 1
Reputation: 5092
Try this sed command
sed 's/http[^ ]\+//g' FileName
Output :
source from here recounting her experiences and thoughts blog even carried the new name. She was far from home.
Upvotes: 1