Bmoe
Bmoe

Reputation: 978

Java How to Remove Double-Quote Character Between Double Quote Text Qualifier

I have a csv file where each field (except column headings) has a double quote text qualifier: field: "some value". However some of the fields in the file have a double quote within the value; field2: "25" TV" or field3: "25" x 14" x 2"" or field4: "A"bcd"ef"g". (I think you get the point). In cases where I have data like in fields 2-4, my java file process fails due to me specifying that the double-quote is a text-qualifier on the fields and it looks as if there are too many fields for that row. How do I do either or all of the following:

What is my level of control over this file? The file comes in as-is, but I just need data from two different columns in the file. I can do whatever I need to do to it to get that data.

Upvotes: 1

Views: 1145

Answers (3)

jalynn2
jalynn2

Reputation: 6457

Assuming that a comma is the column separator and that every column is surrounded by double quotes:

String[] columns = input.split("\",\"");
if (columns.length > 0) {
  columns[0] = columns[0].substring(1);
  String lastColumn = columns[columns.length-1];
  columns[columns.length-1] = lastColumn.substring(0,lastColumn.length()-1);
}

The columns will still have the internal double quotes. You can replace them out if you don't want them.

Upvotes: 0

dan.m was user2321368
dan.m was user2321368

Reputation: 1705

First, if it is indeed a CSV file, you should be using the presence of commas to break each line into columns.

Once its broken in columns, if we know for sure that the value should begin and end with double-quote ("), we can simply remove all of the double-quote and then re-apply the ones at the beginning and end.

    String input = "\"hello\",\"goodbye Java \"the best\" language\", \"this is really \"\"\"bad\"";
    String[] parsed = input.split(",");
    String[] clean = new String[parsed.length];
    int index = 0;
    for (String value : parsed) {
        clean[index] = "\"" + value.replace("\"", "") + "\"";
        index++;
    }

If a comma could exist inside of the value, the following should be used instead

    String input = "\"hello\",\"goodbye,\" Java \"the best\" language\", \"this is really \"\"\"bad\"";
    String[] parsed = input.split("\"\\s*,\\s*\"");
    String[] clean = new String[parsed.length];
    int index = 0;
    for (String value : parsed) {
        clean[index] = "\"" + value.replace("\"", "") + "\"";
        index++;
    }
}

Note that if the sequence of \"\s*,\s*\" existed inside a value, the record would be ambiguous. For example, if it was a two column file, the input record "abc","def","ghi" could be either

value 1 = "abc","def" value 2 = "ghi" or value 1 = "abc" value 2 = "def","ghi"

Upvotes: 1

Micromuncher
Micromuncher

Reputation: 903

Note many CSV implementations will escape a double quote as two consecutive quotes.

So "25"" TV" might (should?) be your input.

Upvotes: 0

Related Questions