emre
emre

Reputation: 7

Regular expression remove some links

i need a regular expression to strip html tags for some links

example

<a href="falanfilan.com" target="_blank"> link </a>

<a href="sample.com" target="_blank"> fasafiso </a>

should be converted to

<a href="falanfilan.com" target="_blank"> link </a>

fasafiso 

Upvotes: 0

Views: 37

Answers (2)

Aaron
Aaron

Reputation: 24812

I'll assume you want to replace all links whose target is sample.com by their content :

match <a[^>]*href="sample.com"[^>]*>([^<]*)</a>
replace by \1

For example with sed :

sed 's/<a[^>]*href="sample.com"[^>]*>([^<]*)</a>/\1/'

Please also keep in mind that if your requirements are complex enough you should instead be using an HTML parser.

Upvotes: 0

Jan
Jan

Reputation: 43169

Depending on your programming language, you could come up with sth. like:

~<a href="sample\.com" [^>]*>(.*?)</a>~
# delimiter ~
# look for <a, everything that is not > and >
# capture everything lazily in a group
# look for a closing tag
# delimiter ~

In your example, group 1 would hold fasafiso and could be replaced/insert via the group $1. See a demo for this approach on regex101.com.

Hint:

This is just a quick-and-dirty solution (e.g. for text editors). If this is getting more complicated, consider using a parser instead.

Upvotes: 1

Related Questions