blickfangQ2
blickfangQ2

Reputation: 39

Java/Groovy regex parse Key-Value pairs without delimiters

I have trouble fetching Key Value pairs with my regex

Code so far:

String raw = '''
MA1

D. Mueller Gießer

MA2 Peter

Mustermann 2. Mann


MA3 Ulrike Mastorius Schmelzer

MA4 Heiner Becker
s 3.Mann

MA5 Rudolf Peters

Gießer

'''

Map map = [:]

ArrayList<String> split = raw.findAll("(MA\\d)+(.*)"){ full, name, value ->  map[name] = value }


println map

Output is: [MA1:, MA2: Peter, MA3: Ulrike Mastorius Schmelzer, MA4: Heiner Becker, MA5: Rudolf Peters]

In my case the keys are: MA1, MA2, MA3, MA\d (so MA with any 1 digit Number)

The value is absolutely everything until the next key comes up (including line breaks, tab, spaces etc...)

Does anybody have a clue how to do this?

Thanks in advance, Sebastian

Upvotes: 1

Views: 289

Answers (2)

Ryszard Czech
Ryszard Czech

Reputation: 18641

Use

(?ms)^(MA\d+)(.*?)(?=\nMA\d|\z)

See proof.

Explanation

                         EXPLANATION
--------------------------------------------------------------------------------
  (?ms)                    set flags for this block (with ^ and $
                           matching start and end of line) (with .
                           matching \n) (case-sensitive) (matching
                           whitespace and # normally)
--------------------------------------------------------------------------------
  ^                        the beginning of a "line"
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    MA                       'MA'
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (                        group and capture to \2:
--------------------------------------------------------------------------------
    .*?                      any character (0 or more times (matching
                             the least amount possible))
--------------------------------------------------------------------------------
  )                        end of \2
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    \n                       '\n' (newline)
--------------------------------------------------------------------------------
    MA                       'MA'
--------------------------------------------------------------------------------
    \d                       digits (0-9)
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    \z                       the end of the string
--------------------------------------------------------------------------------
  )                        end of look-ahead

Upvotes: 0

The fourth bird
The fourth bird

Reputation: 163577

You can capture in the second group all that follows after the key and all the lines that do not start with the key

^(MA\d+)(.*(?:\R(?!MA\d).*)*)

The pattern matches

  • ^ Start of string
  • (MA\d+) Capture group 1 matching MA and 1+ digits
  • ( Capture group 2
    • .* Match the rest of the line
    • (?:\R(?!MA\d).*)* Match all lines that do not start with MA followed by a digit, where \R matches any unicode newline sequence
  • ) Close group 2

Regex demo

In Java with the doubled escaped backslashes

final String regex = "^(MA\\d+)(.*(?:\\R(?!MA\\d).*)*)";

Upvotes: 3

Related Questions