Reputation: 151
I'm trying to split a comma delimited pairs string but can't work out how to cater for an inclusive comma.
Here is my test case -
private void stringSplit() {
String value = "{aaa=1111,bbb=2222,ccc=3333}";
String regEx = "[^,]+=[^,]+";
String separator = "=";
final Pattern pattern = Pattern.compile(regEx);
final Matcher matcher = pattern.matcher(value);
while (matcher.find()) {
final String group = matcher.group();
final String key = group.substring(0, group.indexOf(separator));
final String val =
group.substring(
group.indexOf( separator ) + separator.length(),
group.length());
System.out.println("key [" + key + "], val [" + val + "]");
}
}
and here are my results -
key [{aaa], val [1111]
key [bbb], val [2222]
key [ccc], val [3333}]
All good so far...
But there may be a comma in the numeric value i.e.
"{aaa=11,11,bbb=2222,ccc=333,3}";
the results I would want are -
key [{aaa], val [11,11]
key [bbb], val [2222]
key [ccc], val [333,3}]
Could any of you regular expression guru's help me out here.
thanks!
EDIT
Following on from @bmorris591 further comments.
Ok, I have a final query - and this is a definitive list of what this crazy regex (+ a bit of java code) needs to handle.
Here is my code -
private void stringSplit() {
String value = "{1=\"1, one\", 22=\"+t,w,o\", 333=\"three, \"3\", -33,,333,\", 4444=\"four. '4-4, (44), -44\"}, 555=\"\", \"666\"=6666, \"777\"=\"7777\"}";
String regex = "[^\\{,]+=([[\\w]\\(\\)\\-\\+\\.\"'\\s,]+)[,}]";
String separator = "=";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(value);
while (matcher.find()) {
final String group = matcher.group();
showKeyAndValue(group, separator);
}
}
private void showKeyAndValue(final String group, final String keyValueSeparator) {
System.out.println("group [" + group + "]");
final String key = removeQuotesFromString(group.substring(0,
group.indexOf(keyValueSeparator)));
final String val = removeQuotesFromString(group.substring(
group.indexOf(keyValueSeparator)
+ keyValueSeparator.length(), group.length()));
System.out.println("key [" + key + "], val [" + val + "]");
}
private String removeQuotesFromString(final String str) {
String returnString = str.trim();
if (returnString.startsWith("\"")) {
returnString = returnString.substring(
returnString.indexOf("\"") + 1, returnString
.lastIndexOf("\""));
}
return returnString;
}
And here are the results -
group [1="1, one",]
key [1], val [1, one]
group [ 22="+t,w,o",]
key [22], val [+t,w,o]
group [ 333="three, "3", -33,,333,",]
key [333], val [three, "3", -33,,333,]
group [ 4444="four. '4-4, (44), -44"}]
key [4444], val [four. '4-4, (44), -44]
group [ 555="",]
key [555], val []
group [ "666"=6666,]
key [666], val [6666,]
group [ "777"="7777"}]
key [777], val [7777]
All results are correct apart from key 666. As you can see there is a trailing comma. Now I could just strip this off (for a value that is not encapsulated in quotes (basically a number)) but I was wondering if this could be acheived in the regex as this would be a 'cleaner' solution...
Many, many thanks if you can think of anything.
Upvotes: 1
Views: 398
Reputation: 61148
You can use the magic of negative lookahead, to split your strings on a comma not followed by a digit use
public static void main(String[] args) {
final String s = "{aaa=11,11,bbb=2222,ccc=333,3}";
final String[] ss = s.substring(1, s.length() -1).split(",(?!\\d)");
for(final String str : ss) {
System.out.println(str);
}
}
Output
aaa=11,11
bbb=2222
ccc=333,3
You can easily expand this to yank the key=value pairs directly
public static void main(String[] args) {
final String s = "{aaa=11,11,bbb=2222,ccc=333,3}";
final Pattern p = Pattern.compile("([A-Za-z]++)=([\\d,]+)(?!\\d)[,}]");
final Matcher matcher = p.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println("DONE");
}
}
Output
aaa
11,11
DONE
bbb
2222
DONE
ccc
333,3
DONE
EDIT
Following the OP's comment:
The value part of the pair is alphanumeric (including ,+-*/=()
), also the value is always encapsulated in quotes, there could be multiple occurances of ,+-*/=()
too...
I have revised the expression:
public static void main(String[] args) {
final String s = "{1=\"1, one\", 22=\"+t,w,o\", 333=\"three, 3, -33,,333\", 4444=\"four. 4-4, (44), -44\"}";
System.out.println("String is: " + s);
final Pattern p = Pattern.compile("([^{=,\\s]++)=\"([^\"]++)\"");
final Matcher matcher = p.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println("DONE");
}
Output:
String is: {1="1, one", 22="+t,w,o", 333="three, 3, -33,,333", 4444="four. 4-4, (44), -44"}
1
1, one
DONE
22
+t,w,o
DONE
333
three, 3, -33,,333
DONE
4444
four. 4-4, (44), -44
DONE
The pattern will now match anything that is not =,{
or whitespace followed by an =
and then followed by any pattern not containing "
encapsulated in "
.
Does this help?
Upvotes: 2
Reputation: 151
bmorris591
Thanks for your reply.
Sorry, but looking back my original post was a little too simplistic.
The value part of the pair is alphanumeric (including ",+-*/=()"), also the value is always encapsulated in quotes, there could be multiple occurances of ",+-*/=()" too...
i.e.
"{1=\"1 one\", 22=\"two\", 333=\"three 3\"}"
"{1=\"1, one\", 22=\"+t,w,o\", 333=\"three, 3, -33,,333\", 4444=\"four. 4-4, (44), -44\"}"
Because of the complexity of this I think the most simple solution is to replace all occurances of comma with some marker character before the pair string is constructed, do the regex and then re-apply the comma to the value...
Thank you for your reply to my initial post though as it is a solution to my original question...
Upvotes: 0