RKI
RKI

Reputation: 433

How to replace text in link, but skip this text that already in links?

How to replace specific text in link, but skip this text that already in links?

Example:

<a href="helloworld.com">Lorem ipsum dolor sit amet</a>, consectetur
adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore
magna aliqua. Lorem ipsum dolor sit amet, consectetur <a
href="adipisicing.com">adipisicing</a> elit, sed do eiusmod tempor
incididunt ut labore et dolore <a href="helloworld.com">magna aliqua.
Lorem ipsum</a> dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua.

As you see, I need to replace "Lorem ipsum" to <a href="somewhere.com">Lorem ipsum</a> in the second statement, but skip "Lorem ipsum" that already in links.

Thanks!

Upvotes: 0

Views: 450

Answers (1)

Jens
Jens

Reputation: 25563

Regular expressions are not very well suited to deal with HTML. Every solution you have will fail miserably on comments, embedded javascript or malformed HTML.

That said, if you strictly control the structure of your documents, you can try the regex approach. To match every "Lorem ipsum" not inside an a tag, I'd use

Lorem ipsum(?=([^<]*($|<a |<[^/]|</[^a]))*($|(?<=a )))

This statement uses a look ahead assertion to match "Lorem ipsum" if it is followed by a opening a tag before the next closing one, or no further a tags follow. See it in action at RegExr.

As you can see, it is probably better to use a HTML parser. =)

Upvotes: 4

Related Questions