Reputation: 822
I am developing an asp.net application. I want to add a keyword linking system.
I want to make the keyword a hyper-link to another page. But, I should not link the keyword if its currently linked (to any page). For example:
it is a <a href="http://www.somesite.com">linked keyword</a> and it should be a linked keyword.
should convert to:
it is a <a href="http://www.somesite.com">linked keyword</a> and it should be a linked <a href="http://newlycreatedLink.com">keyword</a>.
As you can see, the first keyword should be left intact.
Could you help me please to solve this problem?
I've found this link in asp.net forums. But I should tune the answer to exclude currently linked keywords. I've searched everywhere but found nothing.
Upvotes: 3
Views: 2224
Reputation: 12389
To check if the keywords is "outside", look ahead
(?=
if after the keyword there's an opening <tag
or the $
end[^<>]*
any amount of characters, that are NOT >
OR <
(?:<\w|$)
where \w
is a shorthand to word-charcters [a-zA-Z_0-9]
So the pattern could look like:
String pattern = @"(?i)\bkeyword\b(?=[^<>]*(?:<\w|$))";
String replacement = @"<a href=\"http://newlycreatedLink.com\">\0</a>";
Put the keyword into word-boundaries \b
and used (?i)
i modifier for case insensitive.
So this would only replace keyword
that is followed by an opening-tag or the end.
UPDATE: To replace keyword
also "inside" tags, that don't end up with </a
add |<\/[^a]
:
String pattern = @"(?i)\bkeyword\b(?=[^<>]*(?:<\w|<\/[^a]|$))";
Upvotes: 2
Reputation: 16440
Don't use regular expressions for sophisticated HTML parsing like this. Use a proper HTML parser instead — here's why.
Upvotes: 1