Nemes
Nemes

Reputation: 1

Regex to extract last elements from a list

I can have as input the following list: V1, V2, V3, V4, V5, V6, V7, V8, V9, V10

I would need a regex that extracts the last 3 elements from the list above. So after applying the regex the list should look like: V8, V9, V10

I have tried this regex /^.*(?=((,.*){3})$)/, it seems I'm getting close but I don't understand why when replacing with $1 I get the output duplicated.

Can anyone help me with an explanation? If I'm able to understand why this happens I should be able to correct it.

Upvotes: 0

Views: 1113

Answers (4)

The fourth bird
The fourth bird

Reputation: 163632

The captured part in your pattern has 3 comma's because the lookahead assertion repeats 3 times a pattern (,.*){3} starting with a comma

There is not language specified, but depending on the supported regex flavor, you could also use a single capture group for the last 3 values between 2 comma's:

^.*(?<!\S)([^,\n]*(?:,[^,\n]*){2})$
  • ^ Start of string
  • .* Match the whole line
  • (?<!\S) Assert a whitespace boundary to the left
  • ( Capture group 1
    • [^,\n]* Optionally match any char except , or a newline
    • (?:,[^,\n]*){2} Repeat 2 times matching the , and optional chars other than , or a newline
  • ) Close group 1
  • $ End of string

Regex demo

Or as an alternative, split on , take the last 3 elements from the collection and join them back again with the same delimiter.

An example using JavaScript:

const str = "V1, V2, V3, V4, V5, V6, V7, V8, V9, V10";
console.log(str.split(", ").slice(-3).join(", "));

Upvotes: 0

morz79
morz79

Reputation: 11

For your case, you should use ?: instead of ?=, then you will get , V8, V9, V10

([a-zA-Z0-9]+(?:,\s[a-zA-Z0-9]+){2})$ will return the same without preceding comma V8, V9, V10

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627600

You can simply extract the last three comma-separated values using

[^\s,][^,]*(?=(?:,[^,]*){0,2}$)

See the regex demo. Details:

  • [^\s,] - a char other than whitespace and comma
  • [^,]* - any zero or more chars other than a comma
  • (?=(?:,[^,]*){0,2}$) - a positive lookahead that requires zero, one or two occurrences of a comma and then zero or more non-comma chars till the end of string immediately to the right of the current location.

You get you match replaced twice because of a known issue, see Why do some regex engines match .* twice in a single input string?.

Upvotes: 1

JGNI
JGNI

Reputation: 4013

The following regex should work

/^.*, ([^,]+, [^,]+, [^,]+)$/
  • ^ Match the beginning of the string
  • .* Match 0 or more characters, taking as many as posible
  • , Match a comma followed by a space
  • ( Begin capture
  • [^,]+ Match one or more not commas
  • This gets repeated so we can match the last 3 elements
  • ) End capture
  • $ Match end of string

Upvotes: 1

Related Questions