Atequer Rahman
Atequer Rahman

Reputation: 1219

Values within a CSV file contains unquoted comma characters

I have got a CSV file from a client where some cell contains multiple commas (,). So after splitting by comma, getting wrong value for some cells.

Example: for cell value of 187,859,049 I am getting 187. But I need 187,859,049.

My code for splitting comma_

        String line = "";

        //Create the file reader
        fileReader = new BufferedReader(new FileReader(readfileName));

        //Read the CSV file header to skip it
        fileReader.readLine();

        //Read the file line by line starting from the second line
        while ((line = fileReader.readLine()) != null) {
            //Get all tokens available in line
            String[] tokens = line.split(Constant.COMMA_DELIMITER);   
            if (tokens.length > 0) {
                //Create a new student object and fill his  data
                ShareNumber shareNumber = new ShareNumber(tokens[Constant.Cell_0],tokens[Constant.Cell_1],tokens[Constant.Cell_2]);
                shareNumberList.add(shareNumber);
            }

How can I solve this problem? Thanks in advance.

Upvotes: 2

Views: 973

Answers (3)

einpoklum
einpoklum

Reputation: 131395

First and foremost - get your client to not put commas in the numbers they write to CSV files.

Assuming you can't do that...

You'll need to be able to tell apart "real" from "fake" commas - using information about the actual data. Specifically, you could use information on:

  • Which fields are known to contain numbers
  • What the value range is (e.g. is it always over 999? then a "field" that's just 3 digits long may actually be part of a larger field than a comma)
  • The overall number of commas on the line; the number of "fake" commas is number-of-commas - (number-of-fields - 1). And if there's just one numeric field then this already tells you exactly which commas are "fake".

Also, this means either you don't start your processing of a line by breaking it up by commas, but rather by determining how to break it up (e.g. by matching regular expressions on it). Alternatively,you could start by breaking it up, but later on need to unify some broken-up parts of numbers.

Upvotes: 3

fjnk
fjnk

Reputation: 62

I read all answers above. I think, this type of weird problem occurs due to client's file issue generally. It wastes a lot of time of developers. So developers should always try to be strict with clients for this type of issue.

Upvotes: -1

Atequer Rahman
Atequer Rahman

Reputation: 1219

By formatting numeric cells in following the way I have fixed the issue.

Formatting number for removing unquoted comma characters.

Upvotes: 0

Related Questions