Reputation: 39
I have trouble fetching Key Value pairs with my regex
Code so far:
String raw = '''
MA1
D. Mueller Gießer
MA2 Peter
Mustermann 2. Mann
MA3 Ulrike Mastorius Schmelzer
MA4 Heiner Becker
s 3.Mann
MA5 Rudolf Peters
Gießer
'''
Map map = [:]
ArrayList<String> split = raw.findAll("(MA\\d)+(.*)"){ full, name, value -> map[name] = value }
println map
Output is: [MA1:, MA2: Peter, MA3: Ulrike Mastorius Schmelzer, MA4: Heiner Becker, MA5: Rudolf Peters]
In my case the keys are: MA1, MA2, MA3, MA\d (so MA with any 1 digit Number)
The value is absolutely everything until the next key comes up (including line breaks, tab, spaces etc...)
Does anybody have a clue how to do this?
Thanks in advance, Sebastian
Upvotes: 1
Views: 289
Reputation: 18641
Use
(?ms)^(MA\d+)(.*?)(?=\nMA\d|\z)
See proof.
Explanation
EXPLANATION
--------------------------------------------------------------------------------
(?ms) set flags for this block (with ^ and $
matching start and end of line) (with .
matching \n) (case-sensitive) (matching
whitespace and # normally)
--------------------------------------------------------------------------------
^ the beginning of a "line"
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
MA 'MA'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
.*? any character (0 or more times (matching
the least amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
MA 'MA'
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\z the end of the string
--------------------------------------------------------------------------------
) end of look-ahead
Upvotes: 0
Reputation: 163577
You can capture in the second group all that follows after the key and all the lines that do not start with the key
^(MA\d+)(.*(?:\R(?!MA\d).*)*)
The pattern matches
^
Start of string(MA\d+)
Capture group 1 matching MA and 1+ digits(
Capture group 2
.*
Match the rest of the line(?:\R(?!MA\d).*)*
Match all lines that do not start with MA followed by a digit, where \R
matches any unicode newline sequence)
Close group 2In Java with the doubled escaped backslashes
final String regex = "^(MA\\d+)(.*(?:\\R(?!MA\\d).*)*)";
Upvotes: 3