Reputation: 101
I've been having problems generating a regex for a particular string.
My source string is basically a set of key-value pairs. My desired output is to Here is a sample string:
:27B:Hello: World!
Something
World: Hello
:29A:Test
:30:Something isn't right-}
Desired output:
Key: 27B Value: Hello: World!
Something
World: Hello
Key: 29A Value: Test
Key: 30 Value: Something isn't right
And here is my regex for it so far:
(\\d+\\w?):([\\w\\d\\s'/,:\\Q.()\\E]+(?=(:\\s*\\d+\\w?:|\\-\\})))
The problem is that i seem to be capturing the entire message.
e.g. Key: 27B Value:Hello: World!
Something
World: Hello
:29A:Test
:30:Something isn't right
What should my regex be so that i could extract these key/value pairs?
Upvotes: 1
Views: 1299
Reputation: 1559
You could try something like this:
Pattern p = Pattern.compile(":(\\d+\\w?):((?:[^:-]|:(?!\\d+\\w?:)|-(?!\\}))+)(?:-}[\\S\\s]*)?");
Matcher m = p.matcher(s);
while (m.find())
System.out.print("Key: " + m.group(1) + " Value: " + m.group(2));
Produces your desired output. The last optional group is to consume -}
and anything after it. Basically finds the key and then consumes all characters until it hits another key.
Edit:
If you want something more true to your original regex you can use:
Pattern p = Pattern.compile("(\\d+\\w?):(.+?(?=(:\\s*\\d+\\w?:|\\-\\})))",Pattern.DOTALL);
Upvotes: 1
Reputation: 122364
+
is greedy, so [\\w\\d\\s'/,:\\Q.()\\E]+
will capture all characters up to the last point in the string at which the lookahead can match. To grab only up to the first such point you would need to use the "reluctant" version +?
instead.
Upvotes: 3