Reputation: 2906
I have the following string:
string = 1231231223123131_FILE_NAME.EXTENSION.OTHEREXTENSION
and the following regular expression:
string.match(/^\d+_([^.]+\.[^.]+)/)[1]
the regular expression returns:
=> FILE_NAME.EXTENSION
While I understand that ^\d+_ is indicating find one of more digits followed by an underscore, my confusion is in the capture group. Particularly why [^.]+ seems to be returning one or more characters before a period and then including the period, but excluding the second period with the second instance of [^.]+
This regular expression combination is even more confusing when you remove the first [^.]+, because then it returns the .OTHEREXTENSION as well.
From my understanding using the caret inside a parentheses [^.] means to exclude whatever follows it. So why in this instance is it including all characters up to and after the first period?
Upvotes: 1
Views: 45
Reputation: 3553
In your regex, you have [^.]+\.[^.]+
[^.]+
stands for one or more, non-period characters which matches FILE_NAME
, and stops when it reaches the period .
\.
matches a single period .
, which it does (after FILE_NAME
but before EXTENSION.OTHEREXTENSION
).
The next [^.]+
matches one or more, non-period characters again, which is EXTENSION
, and stops again when it reaches the period .
Upvotes: 2
Reputation: 22941
Your capturing group says any character that's not a .
then a single .
then another group of non .
characters. The .
in your result is not coming from the first [^.]+
, it's coming from the \.
that follows it and is still within the capturing group.
FILE_NAME.EXTENSION
meets that criteria. FILE_NAME
matches the first character class 1 or more times. This is followed by a dot which matches \.
Then the word EXTENSION
meets the second character class one or more times. When it reaches the 2nd dot the capturing group comes to an end since the regex contains nothing further to match a second .
Upvotes: 1