Fizi
Fizi

Reputation: 1861

skip regex matching if the match is within a particular html tag

This is a follow up to: Javascript regex placeholder prints string instead of its value

I am trying to build a javascript function that looks for a pattern and converts it into a link.

var re = /Ticket-([0-9]*?(?=-)-[0-9]*)/; 
var str = 'ASD Ticket-492-367 - Make my day.'; 
t = str.replace(re,'<a href="http://myworld/ticket/$1">$&</a>')

I now have run into a problem where if my string already contains <a> tags it tries to latch on extra tags to it which makes the output funky. Is there a specific jQuery/JS way for the regex matching to somehow skip contents that's placed inside a particular tag. Like, somehow wrap a <div> tag around the contents, then parse it as a DOM node...and operate on it that way. I am very new to JS so apologize if my thinking is completely off the mark

Update: for the use case. Lets say the text I am getting already has a link such as the one below:

<a href="http://myworld/ticket/4385-21557">Ticket-4385-21557 - abc xyz</a>

This will wrap another tag around matched string 'Ticket-4385-21557'. This is legacy string thats already in the system and cannot change it. So the idea is to work around it by skipping the string inside of the <a> tag

Upvotes: 1

Views: 841

Answers (3)

zfrisch
zfrisch

Reputation: 8670

regex for aopen

/<a.*href=\".*"?">>?/ig  

regex for aclose

/(<\/a>)?<\s?\/a?>?\s>/ig

you would want to parse the string using them before using your original regular expression. a.e

var t = str.replace(aopen, '');
t = t.replace(aclose, '');
t = t.replace(re, '<a href="blahblah">$&</a>);
document.write(t);

I created a fiddle, but it won't save for some reason. Here's my code -Javascript:

window.onload = function() {
var re = /Ticket-([0-9]*?(?=-)-[0-9]*)/g; 
var str = document.body.innerHTML; 
var aopen = /<a.*href=\".*"?">>?/ig;
var aclose = /(<\/a>)?<\s?\/a?>?\s>/ig
t = str.replace(aopen, '');
t = t.replace(aclose, '');
t = t.replace(re,'<a href="http://myworld/ticket/$1">$&</a>')
document.write(t);
}

Here's my code - HTML:

<!DOCTYPE html>
<html>
<body>
Ticket-445-1235 - Make my day<br>
Ticket-445-1255 - Make his day<br>
Ticket-443-4356 - He's feeling lucky<br>
Ticket-443-5555 - punk<br>
<a href="whatever.txt">Ticket-423-5557 - Sdadf </a> <br>
</body>
</html>

Upvotes: 1

Ethan Brown
Ethan Brown

Reputation: 27292

A full answer would depend on knowing a little bit more about the input you're dealing with, but I think I can certainly set you on the right path.

There's no inherent way to say "replace this thing unless it's in this other thing." However, you can combine alternation and function replacements to solve this problem.

At the heart of you're problem, you're actually looking for two different things: <a> tags, which you wish to ignore, and specifically formatted strings (which I'll simplify here to things that look like /Ticket-\d+/ for the sake of keeping this answer simple). That suggests alternation. The question is, how do you tell the regex to recognize which alternation was picked? The easiest way is to use function replacement:

var test = '<a href="#">Ticket-37</a> blah blah Ticket-42';
// expected output:
// <a href="#">Ticket-37</a> blah blah <a href="#">Ticket-42</a>
var output = test.replace(/<a\s.*?<\/a>|Ticket-(\d+)/g, function(m, g1) {
    if(/^<a\s/.test(m)) return m;  // ignore existing links
    return '<a href="#">Ticket-' + g1 + '</a>';
});

What's happening here is that the .replace call is looking for either <a> tags or things that look like /Ticket-\d+/, and it's going to replace them all. However, with <a> tags, it simply replaces them with what they were already; essentially leaving them unmodified (this is a nice feature, because you could actually re-format the <a> tags here if you needed to clean them up as well).

Standard caveat applies to using regex with HTML: you can't guarantee correct parsing of HTML with regexes. HTML is not a regular language, so the best you can do is cover most reasonable cases. Its certainly possible to construct HTML that would foil this method. Is it likely that you would see that in reality? Depends on what your reality is, but probably not. The "robust" solution would be to employ an HTML parser and look for text nodes (that are not direct children of <a> nodes) and make your replacements inside the parsed tree.

Upvotes: 2

Eugene Glova
Eugene Glova

Reputation: 1553

If you are getting str from the DOM element you could just use .text() to get only text without html

HTML

<div class="with-anchor"><a href="http://example.com">ASD Ticket-492-367</a> - Make my day.</div>

JS

var str = $("div.with-anchor").text(); // ASD Ticket-492-367 - Make my day.

Upvotes: 1

Related Questions