ArunK
ArunK

Reputation: 1781

python regex find values with newline character also

I am working on this regex problem I'm unable to solve. The regex I've made

import re
message = """[key    X] value
[key    X]  value value
[key    X]  value
value
value
value
[key     ] value
[key     ] ?
[key     ] ?"""

messageRegex = re.compile(r"\[(.*?)][\s](.*)")

for value in messageRegex.findall(message):
    print(value)

The output to this is, as given below and not everything is getting captured.

('key    X', 'value') ('key\tX', 'value value') ('key\tX', 'value')
('key\t ', 'value') ('key\t ', '?') ('key\t ', '?')

enter image description here

I would expect the output to look like

('key    X', 'value') ('key\tX', 'value value') ('key\tX', 'value \nvalue \nvalue \nvalue')
('key\t ', 'value') ('key\t ', '?') ('key\t ', '?')

Upvotes: 2

Views: 143

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

You may use

(?m)^\[([^][]*)]\s+(.*(?:\n(?!\[[^][]*]).*)*)

See the regex demo

Details

  • ^ - start of a line
  • \[ - [
  • ([^][]*) - Group 1: any 0+ chars other than [ and ]
  • ] - a ] char
  • \s+ - 1+ whitespaces
  • (.*(?:\n(?!\[[^][]*]).*)*) - Group 2:
    • .* - the rest of the line
    • (?:\n(?!\[[^][]*]).*)* - zero or more repetitions of:
      • \n(?!\[[^][]*]) - a newline not followed with a [...] substring
      • .* - the rest of the line

Python demo:

import re
message = """[key    X] value
[key    X]  value value
[key    X]  value
value
value
value
[key     ] value
[key     ] ?
[key     ] ?"""

messageRegex = re.compile(r"^\[([^][]*)]\s+(.*(?:\n(?!\[[^][]*]).*)*)", re.M)

for value in messageRegex.findall(message):
    print(value)

Output:

('key    X', 'value')
('key    X', 'value value')
('key    X', 'value\nvalue\nvalue\nvalue')
('key     ', 'value')
('key     ', '?')
('key     ', '?')

Upvotes: 3

Related Questions