Nicola Peluchetti
Nicola Peluchetti

Reputation: 76910

match only the last instance of a pattern with Javascript regexp

I want to remove size data from a file name like

var src = 'http://az648995.vo.msecnd.net/win/2015/11/Halo-1024x551.jpg';
src = src.replace(
     /-\d+x\d+(.\S+)$/,
    function( match, contents, offset, s ) {
        return contents;
    }
);

this works as expected and i get

http://az648995.vo.msecnd.net/win/2015/11/Halo.jpg

But if I have a filename like

http://az648995.vo.msecnd.net/win/2015/11/slot-Drake-08-2000x1000-1024x512.jpg

it returns

http://az648995.vo.msecnd.net/win/2015/11/slot-Drake-08-1024x512.jpg

instead of the desired

http://az648995.vo.msecnd.net/win/2015/11/slot-Drake-08-2000x1000.jpg

Upvotes: 5

Views: 193

Answers (5)

Redu
Redu

Reputation: 26201

In my recent project, more than once i had encountered the need of matching only the last match where there can be multiple matches of the same pattern in a string. So kind of lastIndexOfMatch.

In your case if the matching pattern is /-\d+x\d+/ then you can simply convert it to match the last matching pattern by /-\d+x\d+(?!.*-\d+x\d+)/. Which means a matching pattern where the remaining part of the string can not contain the same matching pattern. Lets see it in action.

let uri = "http://az648995.vo.msecnd.net/win/2015/11/slot-Drake-08-2000x1000-1024x512.jpg",
    idx = uri.search(/-\d+x\d+(?!.*-\d+x\d+)/),      // lastIndexOfMatch
    res = uri.replace(/-\d+x\d+(?!.*-\d+x\d+)/, ""); // replace the last appearance with ""
console.log(idx);
console.log(res);

Edit

A common misconception could be directly searching for the last item as the intuition dictates instead of searching for the part that matches before the last matching item. This may save us from enormous excessive work of the regex engine yielding a more effective code. So how about using /(.*)-\d+x\d+/ which will put the part before the last match into a matching group ($1). According to this change the regex engine takes only 38 steps instead of 111 steps taken for the previous case. Yes this should be more efficient. OK lets see...

let uri = "http://az648995.vo.msecnd.net/win/2015/11/slot-Drake-08-2000x1000-1024x512.jpg",
    res = uri.replace(/(.*)-\d+x\d+/, "$1"); // remove the last appearance
console.log(res);

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627537

Your regex does not work as expected primarily because of an unescaped dot in (.\S+)$ part. An unescaped . matches any character but a newline. However, \S matches any non-whitespace, including a .. Besides unnecessary backtracking, you may get an unexpected result with a string like http://az648995.vo.msecnd.net/win/2015/11/slot-Drake-08-2000x1000-1024x512.MORE_TEXT_HERE.jpg.

Assuming the extension is the part of a string after the last dot, you can use

-\d+x\d+(\.[^.\s]+)$

See regex demo

The nagated character class [^.\s] matches any character but whitespace and a literal . symbol. Note that there is no point in using a callback function inside a replace, you can use a mere $1 backreference.

JS demo:

var src = 'http://az648995.vo.msecnd.net/win/2015/11/slot-Drake-08-2000x1000-1024x512.jpg';
src = src.replace(/-\d+x\d+(.[^.\s]+)$/, "$1");
document.body.innerHTML = src;

Upvotes: 5

ndnenkov
ndnenkov

Reputation: 36110

Slightly change the regex to be a little more explicit:

/-\d+x\d+(\.[^\s-]+)$/

Upvotes: 2

buckley
buckley

Reputation: 14129

The regex can be simplified to the following

Replace

-\d+x\d+(\.\S+)

With

$1

Upvotes: 0

Darin Dimitrov
Darin Dimitrov

Reputation: 1039498

Try escaping the . and you will be fine:

/-\d+x\d+(\.\S+)$/

Upvotes: 3

Related Questions