ozen
ozen

Reputation: 47

Javascript Regex Catch Word

I want to catch word in paragraphy. I do not want to use word boundary because of unicode character (şöüİıçğ) problems. So I use a regex like this. I get an error Invalid group. Is there someone who can help?

var paragraphy= "Bu örnek bir metindir <span>bu</span> metin; test amaçlı yazılmıştır.";
var word="metin;";
var regex = new RegExp("([\\s>]|^)("+word+")(?=([\\.\\,\\;\\?\\!](?=[\\s<])|(?<![\\.\\,\\;\\?\\!])[<\\s]|$))", "gi");
console.log(paragraphy.match(regex));

I want to this result: ["metin"]

Upvotes: 1

Views: 200

Answers (2)

anubhava
anubhava

Reputation: 786091

Based on discussion above (below your question) you can use this replace:

    var word = "metin";

    var re = new RegExp("(^|[\\s>])(" + word + ")[.,;?!]?(?=[\\s<]|$)", "gi");

    var str = 'Bu örnek bir metindir <span>bu</span> metin; test amaçlı yazılmıştır';
     
    var result = str.replace(re, '$1<span>$2</span>');

    alert(result);

//=> Bu örnek bir metindir <span>bu</span> <span>metin</span> test amaçlı yazılmıştır

RegEx Demo

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627468

You can simplify the boundary check with ([\\s>]|^) group before the word, and (?=[.,;?!\\s<]) lookahead after. Also, since you are using a global flag, and you define capture groups, and you need to access one after matching, you'd better use a RegExp#exec() inside a loop.

Also, if you have some punctuation after it (inside the search word) you should get rid of it first. If it only appears at the end of the word, pre-process it with word = word.replace(/[,.;?!<]+$/, '').

var paragraphy = "Bu örnek bir metindir <span>bu</span> metin; test amaçlı yazılmıştır.";
var word="metin;";
var regex = new RegExp("([\\s>]|^)("+word.replace(/[,.;?!<]+$/, '')+")(?=[.,;?!\\s<])", "gi");
res = paragraphy.replace(regex, '$1<span>metin</span>');
document.body.innerHTML = "<pre>" + res + "</pre>";
span {
  color: #FF0000;
  }

Upvotes: 1

Related Questions