hello
hello

Reputation: 1228

matching group getting added to next group

I am trying to use this regex

(\d{4}-\d{2}-\d{2})\s(\d{2}\:\d{2}\:\d{2})\.\d\s([A-Z]{3})((\d{1,3}\.){3}\d{1,3}(?:(:\d{1,5})?))\s?(.*)

to match a string such as

2021-05-20 11:03:00.0 GMT222.111.222.33

and

2021-05-20 11:03:00.0 GMT222.111.222.33:2323

In the case of this string:

2021-05-20 11:03:00.0 GMT222.111.222.33:2323444 first second

When I enter a port number with more than 5 digits, the extra digits are automatically included with the next matching group 44 first second instead of not matching the entire string at all. The port number is meant to be optional (matching it with the IP group if it exists).

The string after the port number is optional as well but, it should be matched when present.

There is usually a space between IP (and port) and the text after, therefore

Standard text should be:

2021-05-20 11:03:00.0 GMT222.111.222.33

2021-05-20 11:03:00.0 GMT222.111.222.33 first second

2021-05-20 11:03:00.0 GMT222.111.222.33:23232

2021-05-20 11:03:00.0 GMT222.111.222.33:23232 first second

Example of a string that should not match:

2021-05-20 11:03:00.0 GMT222.111.222.33: first second - If the colon is present but the port is absent or more than 5 digits.

``2021-05-20 11:03:00.0 GMT222.111.222.33first secondor2021-05-20 11:03:00.0 GMT222.111.222.33:23232first second`

RegEx101

How may I fix this?

Upvotes: 0

Views: 97

Answers (3)

The fourth bird
The fourth bird

Reputation: 163207

You get a partial match because there is no clear boundary, and using \s?(.*) at the end are both optional.

The .* can match any character, so the digits after matching :\d{1,5} will land in that last group.

Also note that you don't have to escape the colon \:

You might use word boundaries and at the end use a negative lookahead to assert not : followed by a digit directly to the right \b(?!:\d)

\b(\d{4}-\d{2}-\d{2})\s(\d{2}:\d{2}:\d{2})\.\d\s([A-Z]{3})((?:\d{1,3}\.){3}\d{1,3}(:\d{1,5})?)\b(?!:\d)

See a regex demo.

EDIT

After the updated question, you might use:

^(\d{4}-\d{2}-\d{2})\s(\d{2}:\d{2}:\d{2})\.\d\s([A-Z]{3})((?:\d{1,3}\.){3}\d{1,3}(:\d{1,5})?)(?!\S)(.*)$
  • ^ Start of string
  • (\d{4}-\d{2}-\d{2})\s Capture group 1
  • (\d{2}:\d{2}:\d{2}) Capture group 2
  • \.\d\s Match . digit and whitespace char
  • ([A-Z]{3}) Capture group 3, match 3 uppercase chars
  • ( Capture group 4
    • (?:\d{1,3}\.){3}\d{1,3} Match an ip like format
    • (:\d{1,5})? Optional group 5, match : 1-5 digits
  • )(?!\S) A word boundary, assert a whitespace boundary to the right
  • (.*) Capture group 6, match any characters
  • $ End of string

Regex demo

Upvotes: 2

Ron
Ron

Reputation: 6551

This is the fix:

^(\d{4}-\d{2}-\d{2})\s(\d{2}\:\d{2}\:\d{2})\.\d\s([A-Z]{3})((\d{1,3}\.){3}\d{1,3}((:\d{1,5})?))$

Example: https://regex101.com/r/oF7HI2/1

Upvotes: 2

user17492848
user17492848

Reputation:

I'm not sure I fully understand the problem but this could be the solution:

(\d{4}-\d{2}-\d{2})\s(\d{2}\:\d{2}\:\d{2})\.\d\s([A-Z]{3})((\d{1,3}\.){3}\d{1,3}(?:(:\d{1,})?))\s?(?>.*?)

Upvotes: 2

Related Questions