Reputation: 121
I have this kind of texts:
:33: 88 app 2/8 superman taman, puchong 36100, Malaysia
:22: bla bla \bla /bla bla bla
:32: 45//dsfd//qdsfqsdf
:72D: Example
This example has a form :key:value. Value can be on one or multiline.
I tried the regex (:[0-9]{2}[A-Z]?:)(.*) but I got just the first line in the multiline value. When I try the option Pattern.DOTALL
, the result contains all the text for the first key.
What should be the correct Regex?
Upvotes: 2
Views: 290
Reputation: 627022
You may use
(?m)^(:\d{2}[A-Z]?:)(.*(?:\r?\n(?!:\d{2}[A-Z]?:).*)*)
See the regex demo. Do not use Pattern.DOTALL
.
Details
(?m)^
- matches the start of a line(:\d{2}[A-Z]?:)
- Group 1:
:
- a colon\d{2}
- 2 digits[A-Z]?
- 1 or 0 uppercase ASCII letters:
- a colon(.*(?:\r?\n(?!:\d{2}[A-Z]?:).*)*)
- Group 2:
.*
- the rest of the line (0 or more chars other than line break chars)(?:\r?\n(?!:\d{2}[A-Z]?:).*)*
- zero or more sequences of:
\r?\n(?!:\d{2}[A-Z]?:)
- a line break (\r?\n
in Java 8 should be replaced with \R
) that is not followed with the pattern as used in Group 1.*
- the rest of the lineIn Java, use
String pat = "(?m)^(:\\d{2}[A-Z]?:)(.*(?:\r?\n(?!:\\d{2}[A-Z]?:).*)*)";
See Java demo:
String s = ":33: 88 app 2/8\nsuperman taman, puchong\n36100, Malaysia\n:22: bla bla \\bla /bla\nbla bla\n:32: 45//dsfd//qdsfqsdf\n:72D: Example";
Pattern pattern = Pattern.compile("(?m)^(:\\d{2}[A-Z]?:)(.*(?:\r?\n(?!:\\d{2}[A-Z]?:).*)*)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println("--- NEXT PAIR ---");
System.out.println("Key:" + matcher.group(1));
System.out.println("Value:" + matcher.group(2));
}
Output:
--- NEXT PAIR ---
Key::33:
Value: 88 app 2/8
superman taman, puchong
36100, Malaysia
--- NEXT PAIR ---
Key::22:
Value: bla bla \bla /bla
bla bla
--- NEXT PAIR ---
Key::32:
Value: 45//dsfd//qdsfqsdf
--- NEXT PAIR ---
Key::72D:
Value: Example
Upvotes: 3