Reputation: 13
String data = "12-Jan,TRSF E-BANKING CR 12/01 95031 NABUNG M1 DES AGUS JENI ,0,\"50,000.00 CR\",\"3,583,090.00\" ";
System.out.println(data);
output :
12-Jan,TRSF E-BANKING CR 12/01 95031 NABUNG M1 DES AGUS JENI ,0,"50,000.00 CR","3,583,090.00"
String[] items = data.split(",");
System.out.println(new Gson().toJson(items));
output :
["12-Jan","TRSF E-BANKING CR 12/01 95031 NABUNG M1 DES AGUS JENI ","0","\"50","000.00 CR\"","\"3","583","090.00\""]
how to make the comma that is inside the quotes do not split ?
Expected output:
["12-Jan","TRSF E-BANKING CR 12/01 95031 NABUNG M1 DES AGUS JENI","0","50,000.00 CR","3,583,090.00"]
Upvotes: 1
Views: 96
Reputation:
String data = "12-Jan,TRSF E-BANKING CR 12/01 95031 NABUNG" +
" M1 DES AGUS JENI ,0,\"50,000.00 CR\",\"3,583,090.00\" ";
// any sequence of characters
// between quotes, or a single comma
String pattern = "\".*?\"|,+";
String[] arr1 = data
// append substrings that matches the given
// pattern with additional delimiter characters
.replaceAll(pattern, "$0::::")
// remove comma
.replace(",::::", "::::")
// split into an array
// by delimiter characters
.split("::::", 0);
// remove leading and trailing
// whitespaces and empty strings,
// replace sequence of whitespace
// characters with a single whitespace
String[] arr2 = Arrays.stream(arr1)
.map(String::trim)
.map(str -> str.replaceAll("\\s+", " "))
.filter(str -> str.length() > 0)
.toArray(String[]::new);
// output in a column
Arrays.stream(arr2).forEach(System.out::println);
// 12-Jan
// TRSF E-BANKING CR 12/01 95031 NABUNG M1 DES AGUS JENI
// 0
// "50,000.00 CR"
// "3,583,090.00"
Upvotes: 0
Reputation: 521249
As the comment by @Thomas wisely suggests above, a good CSV parser which can be instructed that double quote is an escape character is probably the best way to go here. If you're stuck doing this from scratch in Java, regular expression (regex) can be used, with the following pattern:
".*?"|[^,]+
This will attempt to match a double quoted term first. Only if it can't find such a term, then it will consume arbitrarily until hitting the next comma separator (not inside a double quote). We can use a formal Java regex pattern matcher here:
List<String> terms = new ArrayList<>();
String data = "12-Jan,TRSF E-BANKING CR 12/01 95031 NABUNG M1 DES AGUS JENI ,0,\"50,000.00 CR\",\"3,583,090.00\" ";
String pattern = "\".*?\"|[^,]+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(data.trim());
while (m.find()) {
terms.add(m.group(0).trim());
}
System.out.println(terms);
This prints:
[12-Jan, TRSF E-BANKING CR 12/01 95031 NABUNG M1 DES AGUS JENI, 0,
"50,000.00 CR", "3,583,090.00"]
Upvotes: 3