rvd
rvd

Reputation: 337

Java : Splitting a String using Regex

I have to split a string using comma(,) as a separator and ignore any comma that is inside quotes(")

fieldSeparator : ,
fieldGrouper : "

The string to split is : "1","2",3,"4,5"

I am able to achieve it as follows :

String record = "\"1\",\"2\",3,\"4,5\"";
String[] tokens = record.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");

Output :

"1"
"2"
3
"4,5"

Now the challenge is that the fieldGrouper(") should not be a part of the split tokens. I am unable to figure out the regex for this.

The expected output of the split is :

1
2
3
4,5

Upvotes: 8

Views: 451

Answers (4)

m.cekiera
m.cekiera

Reputation: 5395

I would try with this kind of workaround:

String record = "\"1\",\"2\",3,\"4,5\"";
record = record.replaceAll("\"?(?<!\"\\w{1,9999}),\"?|\""," ");
String[] tokens = record.trim().split(" ");
for(String str : tokens){
    System.out.println(str);
}

Output:

1
2
3
4,5

Upvotes: 1

Matt
Matt

Reputation: 1308

My proposition:

record = record.replaceAll("\",", "|");
record = record.replaceAll(",\\\"", "|");
record = record.replaceAll("\"", "");

String[] tokens = record.split("\\|");

for (String token : tokens) {
   System.out.println(token);
}

Upvotes: 0

Enteleform
Enteleform

Reputation: 3823

Update:

String[] tokens = record.split( "(,*\",*\"*)" );

Result:
Image Link

Initial Solution:
( doesn't work @ .split method )

This RexEx pattern will isolate the sections you want:
(?:\\")(.*?)(?:\\")

It uses non-capturing groups to isolate the pairs of escaped quotes, and a capturing group to isolate everything in between.

Check it out here: Live Demo

Upvotes: 4

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626690

My suggestion:

"([^"]+)"|(?<=,|^)([^,]*)

See the regex demo. It will match "..." like strings and capture into Group 1 only what is in-between the quotes, and then will match and capture into Group 2 sequences of characters other than , at the start of a string or after a comma.

Here is a Java sample code:

String s = "value1,\"1\",\"2\",3,\"4,5\",value2";
Pattern pattern = Pattern.compile("\"([^\"]+)\"|(?<=,|^)([^,]*)");
Matcher matcher = pattern.matcher(s);
List<String> res = new ArrayList<String>();
while (matcher.find()){                      // Run the matcher
    if (matcher.group(1) != null) {          // If Group 1 matched
        res.add(matcher.group(1));           // Add it to the resulting array
    } else {
        res.add(matcher.group(2));           // Add Group 2 as it got matched
    }
} 
System.out.println(res); // => [value1, 1, 2, 3, 4,5, value2]

Upvotes: 2

Related Questions