dutch99
dutch99

Reputation: 3

Trying to build regex that I can use to match on nth value

My string:

DATADOG [ALERT:cpu] [VENDOR:rancher-2] [ENV:np] [CLUSTER:local-oma2] [HOST:servername1]

My goal:

I'm interested in the value in each one of the brackets. So:

I have a regex that seems to select what I need, but I'm unable to use it to select certain values I want. The regex is

(?<=:).+?(?=])

I'd like something like this:

(?<=:).+?(?=]){1}

that returns cpu and

(?<=:).+?(?=]){2}

that returns rancher-2 and

(?<=:).+?(?=]){3}

that returns np and so on.

Any help would be greatly appreciated!

Upvotes: 0

Views: 93

Answers (3)

Bohemian
Bohemian

Reputation: 424983

To match any (ie all) of them:

(?<=:)[^\]]+

See live demo.

You can't just match the nth individually, but you find capture it as group 1:

^(?:[^:]*:){n}([^\]]+)

See live demo.

where n is 1 for cpu, 2 for rancher-2 etc

Upvotes: 1

Cary Swoveland
Cary Swoveland

Reputation: 110665

I have assumed that one function of the regular expression is to confirm that each matched colon is preceded by a left bracket followed by a word in caps.

You could use the following regular expression to match the text of interest.

\[[A-Z]+:\K[^\]]+(?=\])

PCRE demo

The PCRE engine performs the following operations.

\[        match '['
[A-Z]+:   match 1+ uppercase letters followed by ':'
\K        forget everything matched so far
[^\]]+    match 1+ chars other than ']'
(?=\])    the following character must be ']'

If you want a specific match, such as that following "ENV:" in the example, replace [A-Z]+: in the regex with ENV:.

\K resets the starting point of the reported match. It is available in various languages. In addition, it is supported by R's engine (with perl=TRUE) and Python's alternative "regex" module.

Some languages that do not support \K, do support variable-length lookbehinds (e.g, .NET and Javascript). For those languages the following regex could be used.

(?<=\[[A-Z]+:)[^\]]+(?=\])

Demo

Upvotes: 0

Sachin Gupta
Sachin Gupta

Reputation: 196

Simple regex ':(.*?)]'. You can get the list of your desired result.

    import re
    s = 'DATADOG [ALERT:cpu] [VENDOR:rancher-2] [ENV:np] [CLUSTER:local-oma2] [HOST:servername1]'
    result = re.findall(':(.*?)]',s)
    print(result)

Output:

['cpu', 'rancher-2', 'np', 'local-oma2', 'servername1']

Upvotes: 0

Related Questions