Reputation: 1841

Finding the word immediately after a character with regular expression

I am trying to look for the word that is immediately after '%' in the following line:

RP/0/RP0/CPU0:Feb 26 20:04:01.869 UTC: esd[361]: %PKT_INFRA-FM-3-FAULT_MAJOR : ALARM_MAJOR :SWITCH_LINK_ERR_E :DECLARE :0/RP0/CPU0/7:

LC/0/9/CPU0:Feb 26 20:00:25.560 UTC: npu_drvr[253]: %PLATFORM-OFA-6-INFO : NPU #1 Initialization Completed

To start, I used the following Python code, and it is working.

result = re.search(r"\%.* \: ", txt)
result.group()

And here is the result:

However, my reg ex fails in lines like this:

LC/0/9/CPU0:Feb 27 15:33:58.509 UTC: npu_drvr[253]: %FABRIC-NPU_DRVR-1-PACIFIC_ERROR : [5821] : [PACIFIC A0]: For asic 0 : A0 Errata: Observed RX CODE errors on link 120 , This is expected if you have A0 asic versions in the system and do triggers like OIR, reload etc.

Upvotes: 1

Answers (4)

Georgina Skibinski

Reputation: 13377

Try:

import re

x="LC/0/9/CPU0:Feb 27 15:33:58.509 UTC: npu_drvr[253]: %FABRIC-NPU_DRVR-1-PACIFIC_ERROR : [5821] : [PACIFIC A0]: For asic 0 : A0 Errata: Observed RX CODE errors on link 120 , This is expected if you have A0 asic versions in the system and do triggers like OIR, reload etc."

res=re.findall(r"(?<=%)[^\s]+", x)

Outputs:

>>> res

['FABRIC-NPU_DRVR-1-PACIFIC_ERROR']

(?<=%)[^\s]+ - first brackets will be a match only if % is preceding the second brackets, without actually returning %. Next brackets are a match only for the word - meaning string of 1, or more characters, that aren't white space.

Upvotes: 0

kederrac

Reputation: 17322

you could use:

re.search(r'%([^\s]+)', s).group(1)

output (tested against the line for which your regex fails):

FABRIC-NPU_DRVR-1-PACIFIC_ERROR

or you can use:

 re.search(r'%(\S+)', s).group(1) # \S is the same with [^\s]

Upvotes: 1

DYZ

Reputation: 57033

What you want is a percent sign followed by one or more non-spaces:

re.search("%\S+", s)
#<_sre.SRE_Match object; span=(52, 84), match='%FABRIC-NPU_DRVR-1-PACIFIC_ERROR'>

Upvotes: 1

Zecong Hu

Reputation: 3184

Repetitions (* and +) in regular expressions default to "greedy" mode: they try to match the longest piece of text. In the failure case you provided, there are additional colons (:) in the message after the word to match, so the greedy star * matched them all.

You can change the behavior to "lazy" (or "non-greedy") by adding a question mark (?) after the repetition, changing it to:

result = re.search(r"\%.*? \: ", txt)

Check out the results here. For more information, consider reading this article.

Upvotes: 2

Finding the word immediately after a character with regular expression

Answers (4)

Related Questions