michail_w
michail_w

Reputation: 4471

Regexp matching hashtags not wrapped in html tags

I want to make regexp match of hashtags started by @ or #, and not wrapped in html anchor tag. My expression: (@|#)([a-zA-Z_]+)(?!<\/[a]) doesn't work, because in text:

<p>@john Olor it amet, consectetuer adipiscing elit. 
Aenean commodofadgfsd 
<a class="autocompletedTag" href="#" data-id="u:2">@john_wayne</a></p>

Matches @john and @john_wayne, but I don't want to match @john_wayne.

How can Ido this?

Examples

In code :

<p>@john @kate <a>@royal_baby</a> #england <a>#russia</a></p>

I want to match @john, @kate and #england, but not @royal_baby and #russia.

In this code:

<p>#sale #stack #hello <a>@batman</a> #avengers <a>#iron_man</a></p>

I want to match #sale, #stack, #hello and #avengers, but not @batman and #iron_man.

Upvotes: 1

Views: 441

Answers (1)

HamZa
HamZa

Reputation: 14921

You may use the following regex:

/(<a[^>]*>.*?[@#][a-zA-Z_]+.*?<\/a>)|([@#][a-zA-Z_]+)/g

The idea is to match both cases and use a callback to filter them:

input = '<p>@john Olor it amet, consectetuer adipiscing elit.\
Aenean commodofadgfsd \
<a class="autocompletedTag" href="#" data-id="u:2">@john_wayne</a></p>\
<p>@john @kate <a>@royal_baby</a> #england <a>#russia</a></p>\
<p>#sale #stack #hello <a>@batman</a> #avengers <a>#iron_man</a></p>';

matches = new Array(); //empty array
input.replace(/(<a[^>]*>.*?[@#][a-zA-Z_]+.*?<\/a>)|([@#][a-zA-Z_]+)/g, function(all, a, result){
    if(result){ // If the second group exists
        matches.push(result); // then add it to matches
    }
});

document.getElementById('results').innerHTML = matches.join(); // Store results

Online jsfiddle

Explanation

  • [@#] : match either @ or # one time
  • [a-zA-Z_]+ : match letters and underscore one or more times
  • <a : match <a
  • [^>]*> : match anything except > zero or more times and match > at the end
  • .*?[@#][a-zA-Z_]+.*? : match what's between <a></a> ungreedy
  • <\/a> : match the closing tag </a>

Upvotes: 2

Related Questions