InfamousCoconut
InfamousCoconut

Reputation: 794

Split string with double quotes in java

I have delimited file with records like.

val1|"val2"|"val3|val4"|val5

I need to split the record so that if the field contains delimiter in between ,it should be quoted.Otherwise ,if field does not contain delimiter(|) ,quote should be omitted.

Output should be like this:

col1=val1
col2=val2
col3="val3|val4"
col4=val5

I modified answer for similar question to arrive at code below.

String testData = "val1|"val2"|"val3|val4"|val5";
        char quote = '"';
        List<String> csvList = new ArrayList<String>();
        boolean inQuote = false;
        boolean delimInside = false;
        boolean isPrevQuoted = false;
        ;
        int lastStart = 0;
        for (int i = 0; i < testData.length(); i++) {
            if ((i + 1) == testData.length()) {

                if (inQuote && !delimInside) {

                    csvList.add(testData.substring(lastStart + 1, i));
                } else {
                    csvList.add(testData.substring(lastStart, i + 1));
                }
            }
            if (testData.charAt(i) == quote) {
                // if the character is quote
                if (inQuote) {
                    inQuote = false;
                    isPrevQuoted = true;
                    continue; // escape
                }
                inQuote = true;
                continue;
            }
            if (testData.charAt(i) == '|') {
                if (inQuote) {

                    delimInside = true;
                    continue;
                }
                if (isPrevQuoted && !delimInside) {

                    csvList.add(testData.substring(lastStart + 1, i - 1));
                } else {
                    csvList.add(testData.substring(lastStart, i));
                }

                delimInside = false;
                isPrevQuoted = false;
                lastStart = i + 1;
            }
        }

I was looking for elegant solution for the same. Thanks in advance.

Upvotes: 1

Views: 2111

Answers (1)

user1803551
user1803551

Reputation: 13427

Here are examples of one way of doing it with and without regex. First, split the string by ":

String test = "1|\"2\"|\"3|4\"|5|\"6|7|8\"";
List<String> list = new ArrayList<>();
String[] strings = test.split("\"");

Those are:

1|
2
|
3|4
|5|
6|7|8

Without regex I use a StringBuilder so to not create too many strings. I trim the | and pad with " when appropriate:

for (int i=0; i < strings.length; i++) {
    if (strings[i].equals("|"))
        continue;
    StringBuilder builder = new StringBuilder(strings[i]);
    if (strings[i].startsWith("|"))
        builder.deleteCharAt(0);
    if (strings[i].endsWith("|"))
        builder.deleteCharAt(builder.length()-1);
    if (builder.indexOf("|") != -1)
        builder.append("\"").insert(0, "\"");
    list.add(builder.toString());
}

With regex I just need to do the padding with " when appropriate:

Pattern pat = Pattern.compile("([^|]+(?:\\|.+)?)");

for (int i=0; i < strings.length; i++) {
    Matcher m = pat.matcher(strings[i]);
    while (m.find()) 
        if (m.group(1).contains("|"))
            list.add("\"".concat(m.group(1)).concat("\""));
        else
            list.add(m.group(1));
}

After System.out.println(list) both have output of [1, 2, "3|4", 5, "6|7|8"]. You can use or not use Strings instead of StringBuilder depending on your specific cases.

Upvotes: 1

Related Questions