Prasad Chanaka
Prasad Chanaka

Reputation: 19

Python regex extraction

I have a string called "strtosearch2" like this:

[02112017 072755 332][1][ERROR]> ----Message : IDC_NO_MEDIA
[02112017 072755 332][1][INFO]> ----              
[02112017 104502 724][1][ERROR]> ----Message : DEV_NOT_READY
[02112017 104502 724][1][INFO]> ----              
[02112017 104503 331][1][ERROR]> ----Message : DEV_NOT_READY
[02112017 104503 331][1][INFO]> ----  

I want to extract the dates which are having the lines "ERROR" only. I wrote my regex as follows:

down2Date= re.findall(r'\[(.*?)\s\d{6}\s\d{3}\]\[\d\]\[ERROR\]',strtosearch2,re.DOTALL)

output as follows:

02112017
02112017 072755 332][1][INFO]> ----              
[02112017
02112017 104502 724][1][INFO]> ----              
[02112017

My target output:

02112017
02112017
02112017

How can I fix this ?. Thank you

Upvotes: 1

Views: 51

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627371

You may anchor the pattern at the start of the line/string and remove the re.DOTALL modifier:

re.findall(r'(?m)^\[(.*?)\s\d{6}\s\d{3}]\[\d]\[ERROR]', s)

See the regex demo

With re.DOTALL, the . matched any char including line break chars.

With (?m), ^ matches the start of each line, not only the start of the whole string.

Also, \s can match line break chars, so you might want to use [^\S\r\n] instead of it to only match horizontal whitespace.

Upvotes: 2

Denis Rasulev
Denis Rasulev

Reputation: 4069

Try this:

down2Date = re.findall(r'^\[\d+\s\d+\s\d+\]\[\d\]\[ERROR\]', strtosearch2)

Upvotes: 0

Related Questions