AbreQueVoy
AbreQueVoy

Reputation: 2316

How to match pattern starting from a character and end with the same character but don't include the last character in match?

If you find the title a bit cryptic, here's what I meant: I'm looking for every pattern that starts with the hash sign (#) and then match everything after that sign until it finds another hash or another defined entry, but the last hash nor the other entry shouldn't be part of the match.

Given this example:

#one_liner = some cr4zy && weird stuff h3r3 $% ()
#multi_liner =some other s7uff,
    but put in (other) line
#one_liner_again = again, some stuff here...
LB: this line shouldn't be taken into consideration!
#multi_liner_again=You guessed:
going to another line!
<EOF>

I'd like to end up with four matches containing for example such set of tuples:

("one_liner", "some cr4zy && weird stuff h3r3 \$\% ()")
("multi_liner", "some other s7uff, but put in (other) line")
("one_liner_again", "again, some stuff here...")
("multi_liner_again", "You guessed: going to another line!")

I was trying with this pattern, but it doesn't bring what I'm looking for:

#\w.+\s*=(\s*|\S*.+)\w.+\n*\s*.+(?=#)

Upvotes: 0

Views: 66

Answers (1)

mrxra
mrxra

Reputation: 862

this might get you started...you'll probably want to strip the found values of \n and unwanted whitespace though...

import re

data = """
#one_liner = some cr4zy && weird stuff h3r3 $% ()
#multi_liner =some other s7uff,
    but put in (other) line
#one_liner_again = again, some stuff here...
   LB: this line shouldn't be taken into consideration!
#multi_liner_again=You guessed: 
going to another line!
"""

for m in re.findall(r"#([^#]+)\s*=\s*((?:[^#](?!LB:))*)", data, re.MULTILINE|re.DOTALL):
    print(m)

prints:

('one_liner ', 'some cr4zy && weird stuff h3r3 $% ()\n')
('multi_liner ', 'some other s7uff,\n    but put in (other) line\n')
('one_liner_again ', 'again, some stuff here...\n  ')
('multi_liner_again', 'You guessed: \ngoing to another line!\n')

Upvotes: 2

Related Questions