Reputation: 241
I try to parse a text like
0:ID IN (1002,25);1:ID IN (2,3,4) AND COQ>=0 AND COQ<=9;2:ID IN
(73150,73150) AND TOTAL>=0 AND TOTAL<=99999
in Java. I need the text content between the numbers including the colon and the semicolon or eol:
0:<--this-->;1:<--this-->;2:<--this-->
- ID IN (1002,25)
- ID IN (2,3,4) AND CO>=0 AND CO<=9
- ID IN (73150,73150) AND TOTAL>=0 AND TOTAL<=99999
an additional problem occurs because it is possible that whitespaces will be between the numbers and the colons
0 :ID IN (1002,25); 1 :ID IN (2,3,4) AND COQ>=0 AND COQ<=9;2:ID IN
(73150,73150) AND TOTAL>=0 AND TOTAL<=99999
i tried (?<=[\d]:).*(?=;|$)
and (?<=:).*(?=;|$)
but both terms does not solve the problem, because they ignore digits with colons between the first and last appearance:
and they would not ignore digits with colons placed in captions (next problem but in my case a minor one):
0 :NAME = '3:;' OR NAME = "0 : ;" ; 1 :CO>=0;2:TOTAL<=99999
- NAME = '3:;' OR NAME = "0 : ;"
- CO>=0
- TOTAL<=99999
i would be very cool, if you have a good advice to solve this tricky problem. merci merci pavoo
Upvotes: 3
Views: 83
Reputation: 627468
You can use a more complex regex like this to consider all your corner cases:
(?:^\s*(?!\s*\d+\s*:)|\d+\s*:)((?:'[^']*'|"[^"]*"|[^;])+)
See demo
The (?:^\s*(?!\s*\d+\s*:)|\d+\s*:)
subpattern matches the starting subtexts (beginning of string with optional whitespace not followed by optional whitespace, digits, optional whitespace and a colon, or digits followed by optional whitespace and a colon), and then either characters other than ;
, or strings inside "..."
or '...'
.
String s = "0:ID IN (1002,25);1:ID IN (2,3,4) AND COQ>=0 AND COQ<=9;2:ID IN (73150,73150) AND TOTAL>=0 AND TOTAL<=99999";
Pattern pattern = Pattern.compile("(?:^\\s*(?!\\s*\\d+\\s*:)|\\d+\\s*:)((?:\'[^\']*\'|\"[^\"]*\"|[^;])+)");
Matcher matcher = pattern.matcher(s);
System.out.println("Match 1:\n");
while (matcher.find()){
System.out.println(matcher.group(1));
}
Upvotes: 1
Reputation: 48837
Playing with lookarounds, this one should suit your needs:
(?<=:).*?(?=;|$)
Visualization by Debuggex
Demo on regex101
Don't forget to enable the dotall mode.
In Java:
Pattern pattern = Pattern.compile("(?<=:).*?(?=;|$)", Pattern.DOTALL);
Matcher matcher = pattern.matcher(yourInputString);
while (matcher.find()) {
System.out.println(matcher.group());
}
Upvotes: 2
Reputation: 7357
I came up with this solution, but i am need to replace the 0:
in the first array element.
String s = "0 :NAME = '3:;' OR NAME = \"0 : ;\" ; 1 :CO>=0;2:TOTAL<=99999";
for (String s2 : s.split(";\\s*\\d\\s*:")) {
System.out.println(s2.replaceAll("^(\\s*\\d\\s*:)", ""));
}
s = "0:ID IN (1002,25);1:ID IN (2,3,4) AND COQ>=0 AND COQ<=9;2:ID IN (73150,73150) AND TOTAL>=0 AND TOTAL<=99999";
for (String s2 : s.split(";\\s*\\d\\s*:")) {
System.out.println(s2.replaceAll("^(\\s*\\d\\s*:)", ""));
}
From what i see it should be getting the correct results.
Upvotes: 1
Reputation: 89639
You can split your string in this way:
String parts[] = s.split(";?\\s*\\d+\\s*:");
Upvotes: 1
Reputation: 96018
Try the following regex:
\d+\s*:([^;]+)
The captured group is what you want.
Upvotes: 1