MartyIX
MartyIX

Reputation: 28648

How to obtain index of subpattern in JavaScript regexp?

I wrote a regular expression in JavaScript for searching searchedUrl in a string:

var input = '1234 url(  test  ) 5678';
var searchedUrl = 'test';

var regexpStr = "url\\(\\s*"+searchedUrl+"\\s*\\)"; 
var regex = new RegExp(regexpStr , 'i');

var match = input.match(regex);
console.log(match); // return an array

Output:

["url(            test  )", index: 5, input: "1234 url(            test  ) 5678"]

Now I would like to obtain position of the searchedUrl (in the example above it is the position of test in 1234 url( test ) 5678.

How can I do that?

Upvotes: 4

Views: 2108

Answers (4)

Shlomi Lachmish
Shlomi Lachmish

Reputation: 581

You can add the 'd' flag to the regex in order to generate indices for substring matches.

const input = '1234 url(  test  ) 5678';
const searchedUrl = 'test';

const regexpStr = "url\\(\\s*("+searchedUrl+")\\s*\\)"; 
const regex = new RegExp(regexpStr , 'id');

const match = regex.exec(input).indices[1]
console.log(match); // return [11, 15] 

Upvotes: 2

Pebbl
Pebbl

Reputation: 35995

As far as I could tell it wasn't possible to get the offset of a sub-match automatically, you have to do the calculation yourself using either lastIndex of the RegExp, or the index property of the match object returned by exec(). Depending on which you use you'll either have to add or subtract the length of groups leading up to your sub-match. However, this does mean you have to group the first or last part of the Regular Expression, up to the pattern you wish to locate.

lastIndex only seems to come into play when using the /g/ global flag, and it will record the index after the entire match. So if you wish to use lastIndex you'll need to work backwards from the end of your pattern.

For more information on the exec() method, see here:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec

The following succinctly shows the solution in operation:

var str = '---hello123';
var r = /([a-z]+)([0-9]+)/;
var m = r.exec( str );
alert( m.index + m[1].length ); // will give the position of 123

update

This would apply to your issue using the following:

var input = '1234 url(  test  ) 5678';
var searchedUrl = 'test';
var regexpStr = "(url\\(\\s*)("+searchedUrl+")\\s*\\)";
var regex = new RegExp(regexpStr , 'i');
var match = regex.exec(input);

Then to get the submatch offset you can use:

match.index + match[1].length

match[1] now contains url( (plus two spaces) due to the bracket grouping which allows us to tell the internal offset.

update 2

Obviously things are a little more complicated if you have patterns in the RegExp, that you wish to group, before the actual pattern you want to locate. This is just a simple act of adding together each group length.

var s = '~- [This may or may not be random|it depends on your perspective] -~';
var r = /(\[)([a-z ]+)(\|)([a-z ]+)(\])/i;
var m = r.exec( s );

To get the offset position of it depends on your perspective you would use:

m.index + m[1].length + m[2].length + m[3].length;

Obviously if you know the RegExp has portions that never change length, you can replace those with hard coded numeric values. However, it's probably best to keep the above .length checks, just in case you — or someone else — ever changes what your expression matches.

Upvotes: 3

Michael Geary
Michael Geary

Reputation: 28850

You don't need the index.

This is a case where providing just a bit more information would have gotten a much better answer. I can't fault you for it; we're encouraged to create simple test cases and cut out irrelevant detail.

But one important item was missing: what you plan to do with that index. In the meantime, we were all chasing the wrong problem. :-)

I had a feeling something was missing; that's why I asked you about it.

As you mentioned in the comment, you want to find the URL in the input string and highlight it in some way, perhaps by wrapping it in a <b></b> tag or the like:

'1234 url(  <b>test</b>  ) 5678'

(Let me know if you meant something else by "highlight".)

You can use character indexes to do that, however there is a much easier way using the regular expression itself.

Getting the index

But since you asked, if you did need the index, you could get it with code like this:

var input = '1234 url(  test  ) 5678';
var url = 'test';

var regexpStr = "^(.*url\\(\\s*)"+ url +"\\s*\\)"; 
var regex = new RegExp( regexpStr , 'i' );

var match = input.match( regex );
var start = match[1].length;

This is a bit simpler than the code in the other answers, but any of them would work equally well. This approach works by anchoring the regex to the beginning of the string with ^ and putting all the characters before the URL in a group with (). The length of that group string, match[1], is your index.

Slicing and dicing

Once you know the starting index of test in your string, you could use .slice() or other string methods to cut up the string and insert the tags, perhaps with code something like this:

// Wrap url in <b></b> tag by slicing and pasting strings
var output =
    input.slice( 0, start ) +
    '<b>' + url + '</b>' +
    input.slice( start + url.length );

console.log( output );

That will certainly work, but it is really doing things the hard way.

Also, I left out some error handling code. What if there is no matching URL? match will be undefined and the match[1] will fail. But instead of worrying about that, let's see how we can do it without any character indexing at all.

The easy way

Let the regular expression do the work for you. Here's the whole thing:

var input = '1234 url(  test  ) 5678';
var url = 'test';

var regexpStr = "(url\\(\\s*)(" + url + ")(\\s*\\))"; 
var regex = new RegExp( regexpStr , 'i' );

var output = input.replace( regex, "$1<b>$2</b>$3" );

console.log( output );

This code has three groups in the regular expression, one to capture the URL itself, with groups before and after the URL to capture the other matching text so we don't lose it. Then a simple .replace() and you're done!

You don't have to worry about any string lengths or indexes this way. And the code works cleanly if the URL isn't found: it returns the input string unchanged.

Upvotes: 1

Qtax
Qtax

Reputation: 33908

JS doesn't have a direct way to get the index of a subpattern/capturing group. But you can work around that with some tricks. For example:

var reStr = "(url\\(\\s*)" + searchedUrl + "\\s*\\)";
var re = new RegExp(reStr, 'i');

var m = re.exec(input);
if(m){
    var index = m.index + m[1].length;
    console.log("url found at " + index);
}

Upvotes: 2

Related Questions