Robbie
Robbie

Reputation: 101

Java regex positive lookahead

I've been having problems generating a regex for a particular string.

My source string is basically a set of key-value pairs. My desired output is to Here is a sample string:

:27B:Hello: World!
     Something
     World: Hello
:29A:Test
:30:Something isn't right-}

Desired output:

Key: 27B  Value: Hello: World!
     Something
     World: Hello
Key: 29A  Value: Test
Key: 30   Value: Something isn't right

And here is my regex for it so far:

(\\d+\\w?):([\\w\\d\\s'/,:\\Q.()\\E]+(?=(:\\s*\\d+\\w?:|\\-\\})))

The problem is that i seem to be capturing the entire message.

   e.g. Key: 27B Value:Hello: World!
         Something
         World: Hello
    :29A:Test
    :30:Something isn't right

What should my regex be so that i could extract these key/value pairs?

Upvotes: 1

Views: 1299

Answers (2)

rvalvik
rvalvik

Reputation: 1559

You could try something like this:

Pattern p = Pattern.compile(":(\\d+\\w?):((?:[^:-]|:(?!\\d+\\w?:)|-(?!\\}))+)(?:-}[\\S\\s]*)?");
Matcher m = p.matcher(s);
while (m.find())
    System.out.print("Key: " + m.group(1) + " Value: " + m.group(2));

Produces your desired output. The last optional group is to consume -} and anything after it. Basically finds the key and then consumes all characters until it hits another key.

Edit:
If you want something more true to your original regex you can use:

Pattern p = Pattern.compile("(\\d+\\w?):(.+?(?=(:\\s*\\d+\\w?:|\\-\\})))",Pattern.DOTALL);

Upvotes: 1

Ian Roberts
Ian Roberts

Reputation: 122364

+ is greedy, so [\\w\\d\\s'/,:\\Q.()\\E]+ will capture all characters up to the last point in the string at which the lookahead can match. To grab only up to the first such point you would need to use the "reluctant" version +? instead.

Upvotes: 3

Related Questions