Reputation: 7
i need a regular expression to strip html tags for some links
example
<a href="falanfilan.com" target="_blank"> link </a>
<a href="sample.com" target="_blank"> fasafiso </a>
should be converted to
<a href="falanfilan.com" target="_blank"> link </a>
fasafiso
Upvotes: 0
Views: 37
Reputation: 24812
I'll assume you want to replace all links whose target is sample.com
by their content :
match <a[^>]*href="sample.com"[^>]*>([^<]*)</a>
replace by \1
For example with sed :
sed 's/<a[^>]*href="sample.com"[^>]*>([^<]*)</a>/\1/'
Please also keep in mind that if your requirements are complex enough you should instead be using an HTML parser.
Upvotes: 0
Reputation: 43169
Depending on your programming language, you could come up with sth. like:
~<a href="sample\.com" [^>]*>(.*?)</a>~
# delimiter ~
# look for <a, everything that is not > and >
# capture everything lazily in a group
# look for a closing tag
# delimiter ~
In your example, group 1 would hold fasafiso
and could be replaced/insert via the group $1
.
See a demo for this approach on regex101.com.
This is just a quick-and-dirty solution (e.g. for text editors). If this is getting more complicated, consider using a parser instead.
Upvotes: 1