Python regex extraction

Question

I have a string called "strtosearch2" like this:

[02112017 072755 332][1][ERROR]> ----Message : IDC_NO_MEDIA
[02112017 072755 332][1][INFO]> ----              
[02112017 104502 724][1][ERROR]> ----Message : DEV_NOT_READY
[02112017 104502 724][1][INFO]> ----              
[02112017 104503 331][1][ERROR]> ----Message : DEV_NOT_READY
[02112017 104503 331][1][INFO]> ----

I want to extract the dates which are having the lines "ERROR" only. I wrote my regex as follows:

down2Date= re.findall(r'$$(.*?)\s\d{6}\s\d{3}$$$$\d$$$$ERROR$$',strtosearch2,re.DOTALL)

output as follows:

02112017
02112017 072755 332][1][INFO]> ----              
[02112017
02112017 104502 724][1][INFO]> ----              
[02112017

My target output:

02112017
02112017
02112017

How can I fix this ?. Thank you

Wiktor Stribiżew · Accepted Answer

You may anchor the pattern at the start of the line/string and remove the re.DOTALL modifier:

re.findall(r'(?m)^\[(.*?)\s\d{6}\s\d{3}]\[\d]\[ERROR]', s)

See the regex demo

With re.DOTALL, the . matched any char including line break chars.

With (?m), ^ matches the start of each line, not only the start of the whole string.

Also, \s can match line break chars, so you might want to use [^\S\r\n] instead of it to only match horizontal whitespace.

Python regex extraction

Answers (2)

Related Questions