stagas
stagas

Reputation: 4793

Return positions of a regex match() in Javascript?

Is there a way to retrieve the (starting) character positions inside a string of the results of a regex match() in Javascript?

Upvotes: 243

Views: 205291

Answers (12)

Jimbo Jonny
Jimbo Jonny

Reputation: 3579

From developer.mozilla.org docs on the String .match() method:

The returned Array has an extra input property, which contains the original string that was parsed. In addition, it has an index property, which represents the zero-based index of the match in the string.

When dealing with a non-global regex (i.e., no g flag on your regex), the value returned by .match() has an index property...all you have to do is access it.

var index = str.match(/regex/).index;

Here is an example showing it working as well:

var str = 'my string here';

var index = str.match(/here/).index;

console.log(index); // <- 10

I have successfully tested this all the way back to IE5.

Upvotes: 28

Steven Schkolne
Steven Schkolne

Reputation: 89

I had luck using this single-line solution based on matchAll (my use case needs an array of string positions)

let regexp = /bar/g;
let str = 'foobarfoobar';

let matchIndices = Array.from(str.matchAll(regexp)).map(x => x.index);

console.log(matchIndices)

output: [3, 9]

Upvotes: 4

Claude
Claude

Reputation: 9980

I'm afraid the previous answers (based on exec) don't seem to work in case your regex matches width 0. For instance (Note: /\b/g is the regex that should find all word boundaries) :

var re = /\b/g,
    str = "hello world";
var guard = 10;
while ((match = re.exec(str)) != null) {
    console.log("match found at " + match.index);
    if (guard-- < 0) {
      console.error("Infinite loop detected")
      break;
    }
}

One can try to fix this by having the regex match at least 1 character, but this is far from ideal (and means you have to manually add the index at the end of the string)

var re = /\b./g,
    str = "hello world";
var guard = 10;
while ((match = re.exec(str)) != null) {
    console.log("match found at " + match.index);
    if (guard-- < 0) {
      console.error("Infinite loop detected")
      break;
    }
}

A better solution (which does only work on newer browsers / needs polyfills on older/IE versions) is to use String.prototype.matchAll()

var re = /\b/g,
    str = "hello world";
console.log(Array.from(str.matchAll(re)).map(match => match.index))

Explanation:

String.prototype.matchAll() expects a global regex (one with g of global flag set). It then returns an iterator. In order to loop over and map() the iterator, it has to be turned into an array (which is exactly what Array.from() does). Like the result of RegExp.prototype.exec(), the resulting elements have an .index field according to the specification.

See the String.prototype.matchAll() and the Array.from() MDN pages for browser support and polyfill options.


Edit: digging a little deeper in search for a solution supported on all browsers

The problem with RegExp.prototype.exec() is that it updates the lastIndex pointer on the regex, and next time starts searching from the previously found lastIndex.

var re = /l/g,
str = "hello world";
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)

This works great as long as the regex match actually has a width. If using a 0 width regex, this pointer does not increase, and so you get your infinite loop (note: /(?=l)/g is a lookahead for l -- it matches the 0-width string before an l. So it correctly goes to index 2 on the first call of exec(), and then stays there:

var re = /(?=l)/g,
str = "hello world";
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)

The solution (that is less nice than matchAll(), but should work on all browsers) therefore is to manually increase the lastIndex if the match width is 0 (which may be checked in different ways)

var re = /\b/g,
    str = "hello world";
while ((match = re.exec(str)) != null) {
    console.log("match found at " + match.index);

    // alternative: if (match.index == re.lastIndex) {
    if (match[0].length == 0) {
      // we need to increase lastIndex -- this location was already matched,
      // we don't want to match it again (and get into an infinite loop)
      re.lastIndex++
    }
}

Upvotes: 4

Thomas FONTAINE
Thomas FONTAINE

Reputation: 9

var str = 'my string here';

var index = str.match(/hre/).index;

alert(index); // <- 10

Upvotes: -1

brismuth
brismuth

Reputation: 38392

In modern browsers, you can accomplish this with string.matchAll().

The benefit to this approach vs RegExp.exec() is that it does not rely on the regex being stateful, as in @Gumbo's answer.

let regexp = /bar/g;
let str = 'foobarfoobar';

let matches = [...str.matchAll(regexp)];
matches.forEach((match) => {
    console.log("match found at " + match.index);
});

Upvotes: 35

SwiftNinjaPro
SwiftNinjaPro

Reputation: 853

function trimRegex(str, regex){
    return str.substr(str.match(regex).index).split('').reverse().join('').substr(str.match(regex).index).split('').reverse().join('');
}

let test = '||ab||cd||';
trimRegex(test, /[^|]/);
console.log(test); //output: ab||cd

or

function trimChar(str, trim, req){
    let regex = new RegExp('[^'+trim+']');
    return str.substr(str.match(regex).index).split('').reverse().join('').substr(str.match(regex).index).split('').reverse().join('');
}

let test = '||ab||cd||';
trimChar(test, '|');
console.log(test); //output: ab||cd

Upvotes: 0

Gumbo
Gumbo

Reputation: 655189

exec returns an object with a index property:

var match = /bar/.exec("foobar");
if (match) {
    console.log("match found at " + match.index);
}

And for multiple matches:

var re = /bar/g,
    str = "foobarfoobar";
while ((match = re.exec(str)) != null) {
    console.log("match found at " + match.index);
}

Upvotes: 317

felipeab
felipeab

Reputation: 110

Here is a cool feature I discovered recently, I tried this on the console and it seems to work:

var text = "border-bottom-left-radius";

var newText = text.replace(/-/g,function(match, index){
    return " " + index + " ";
});

Which returned: "border 6 bottom 13 left 18 radius"

So this seems to be what you are looking for.

Upvotes: 7

Yaroslav
Yaroslav

Reputation: 31

var str = "The rain in SPAIN stays mainly in the plain";

function searchIndex(str, searchValue, isCaseSensitive) {
  var modifiers = isCaseSensitive ? 'gi' : 'g';
  var regExpValue = new RegExp(searchValue, modifiers);
  var matches = [];
  var startIndex = 0;
  var arr = str.match(regExpValue);

  [].forEach.call(arr, function(element) {
    startIndex = str.indexOf(element, startIndex);
    matches.push(startIndex++);
  });

  return matches;
}

console.log(searchIndex(str, 'ain', true));

Upvotes: 2

stagas
stagas

Reputation: 4793

Here's what I came up with:

// Finds starting and ending positions of quoted text
// in double or single quotes with escape char support like \" \'
var str = "this is a \"quoted\" string as you can 'read'";

var patt = /'((?:\\.|[^'])*)'|"((?:\\.|[^"])*)"/igm;

while (match = patt.exec(str)) {
  console.log(match.index + ' ' + patt.lastIndex);
}

Upvotes: 80

Sandro Rosa
Sandro Rosa

Reputation: 517

This member fn returns an array of 0-based positions, if any, of the input word inside the String object

String.prototype.matching_positions = function( _word, _case_sensitive, _whole_words, _multiline )
{
   /*besides '_word' param, others are flags (0|1)*/
   var _match_pattern = "g"+(_case_sensitive?"i":"")+(_multiline?"m":"") ;
   var _bound = _whole_words ? "\\b" : "" ;
   var _re = new RegExp( _bound+_word+_bound, _match_pattern );
   var _pos = [], _chunk, _index = 0 ;

   while( true )
   {
      _chunk = _re.exec( this ) ;
      if ( _chunk == null ) break ;
      _pos.push( _chunk['index'] ) ;
      _re.lastIndex = _chunk['index']+1 ;
   }

   return _pos ;
}

Now try

var _sentence = "What do doers want ? What do doers need ?" ;
var _word = "do" ;
console.log( _sentence.matching_positions( _word, 1, 0, 0 ) );
console.log( _sentence.matching_positions( _word, 1, 1, 0 ) );

You can also input regular expressions:

var _second = "z^2+2z-1" ;
console.log( _second.matching_positions( "[0-9]\z+", 0, 0, 0 ) );

Here one gets the position index of linear term.

Upvotes: 2

Jimmy
Jimmy

Reputation: 37081

You can use the search method of the String object. This will only work for the first match, but will otherwise do what you describe. For example:

"How are you?".search(/are/);
// 4

Upvotes: 12

Related Questions