Lucas H. Xu
Lucas H. Xu

Reputation: 905

How to Split and Extract Characters from A String Using One Expression in Regex

Example:

aaa.bbbb.ccc4.ddd1.eee.fff
1112.2223.333.4445.555.6661.7773.8881.999

And how to return ddd and 777 using one expression, where they are always the first 3 characters of last third string between dots.

I know how to do this in two expression:

`[^\.]+\.[^\.]+\.[^\.]+$`
`^\w{3}`

Is there a way to combine them together? And the second expr is applied to not the original but the result of the first expr?

Upvotes: 2

Views: 881

Answers (3)

JvdV
JvdV

Reputation: 75840

Here is another option:

(?=(\.[^.]*){3}$)\.(.{3})

Where you'd match:

  • (?= - Positive lookahead.
    • (\.[^.]*){3} - 1st Capture group to match a literal dot, anything but a dot zero or more times. Repeat capture group three times.
    • $) - End string ancor and close lookahead.
  • \. - A literal dot.
  • (.{3}) - 2nd Capture group to capture first three digits after the dot.

Extract from 2nd capture group. Or if you want you could use a non-catpure group and capture from 1st capture group: (?=(?:\.[^.]*){3}$)\.(.{3})

Upvotes: 1

Cary Swoveland
Cary Swoveland

Reputation: 110675

You could match the regular expression

(?<=\.).{3}(?=[^.]*(?:\.[^.]*){2}$)

Start your engine!

The regex engine performs the following operations.

(?<=\.)        : positive lookbehind asserts previous
                 char was '.'
.{3}           : match 3 chars
(?=            : begin positive lookahead
  [^.]*        : match 0+ chars other than '.'
  (?:\.[^.]*)  : match '.' then 0+ chars other than
                 '.' in a non-capture group
  {2}          : execute non-capture group twice
  $            : assert end of string
)              : end positive lookahead

Another way would be to use the regular expression

(?=\.(.{3})[^.]*(?:\.[^.]*){2}$)

capturing the desired 3-character string in capture group 1.

Restart engine

(?=            : begin positive lookahead
  \.           : match '.'
  (.{3})       : match 3 chars in capture group 1
  [^.]*        : match 0+ chars other than '.'
  (?:\.[^.]*)  : match '.' then 0+ chars other than
                 '.' in a non-capture group
  {2}          : execute non-capture group twice
  $            : assert end of string
)              : end positive lookahead

If the match succeeds an empty string at the beginning of the string is matched, but it is the contents of capture group 1 that is of interest.

Upvotes: 3

The fourth bird
The fourth bird

Reputation: 163237

You could match a dot, and capture 3 characters in a capturing group followed by matching 0+ times any char except a dot till the next dot.

Then match the last 2 parts and assert the end of the string.

\.([^.]{3})[^.]*\.[^.]+\.[^.]+$

Regex demo

If there is nothing preceding, you could either match a dot or assert the start of the string.

(?:^|\.)([^.]{3})[^.]*\.[^.]+\.[^.]+$

Regex demo

Note that a [^.] can also match a space or a newline. Use \S to match a non whitespace char.

Upvotes: 3

Related Questions