Aaron Ponti
Aaron Ponti

Reputation: 13

Strange behavior of regexp in JavaScript

I wrote a simple JavaScript function to split a file name into parts: given a file name of the type 'image01.png' it splits it into 'image', '01', 'png'.

For this I use the following regular expression:

var reg = /(\D+)(\d+).(\S+)$/;

This works.

However, I would like to be able to split also something like this: day12Image01.png into 'day12Image', '01', 'png'. Generally, I would like to have any number of additional digits associated to the body as long as they do not fall right before the extension.

I tried with:

var reg = /(.+)(\d+).(\S+)$/;

or the alternative:

var reg = /(\S+)(\d+).(\S+)$/;

Confusingly (to me), if I apply those regular expressions to 'image01.png' I get following decomposition: 'image0', '1', 'png'.

Why is the '0' being assigned to the body instead of the numerical index in these cases?

Thanks for any feedback.

Upvotes: 1

Views: 70

Answers (3)

Sean Airey
Sean Airey

Reputation: 6372

By default, capture groups are greedy, they will capture as much as they can, and since + means one OR more, it can just match the last digit and leave the first to the . or the \S. Make them un-greedy with ?:

var reg = /(.+?)(\d+).(\S+)$/;

Or

var reg = /(\S+?)(\d+).(\S+)$/;

Upvotes: 0

begemotv2718
begemotv2718

Reputation: 868

Try to use non-greedy regular expression /(\S+?)(\d+).(\S+)$/. As far as I know this should work for javascript.

Upvotes: 1

VisioN
VisioN

Reputation: 145398

Here is one possible regular expression that should work fine:

/^(.+?)(\d+)\.(\S+)$/

Note, you should escape a dot . character, since otherwise the regex will consider it as 'any character' (so called "Special dot").

Upvotes: 0

Related Questions