raghul P.R
raghul P.R

Reputation: 1

how to concat cells by comma using java

I have a csv file in which each cell is a number which may contain a comma. Now I like to import it into my program so that the cells are splitted. Is there any way to join the numbers correctly?

For example the following file content:

column1 column2 column3 column4
10      10,000  100     1000   

After splitting by comma it is listed:

column1 column2 column3 column4 column5
10       10     000      100     1000

while the excepted output would be:

column1 column2 column3 column4 
10      10000   100     1000 

Here is my code:

File file = new File("csv file");

try (BufferedReader bufRdr = new BufferedReader(new FileReader(file))) {

    while((line = bufRdr.readLine()) != null)
    {
        if((i<5) || (i == myDimensionProcessor.rowCount-1))
        {  
            i++;
        }
        else
        {
            String a = line.replaceAll("[/\\\\%|*^`]", "");
            c = a.replaceAll("--", "NA");                                                    
            i++;
            StringTokenizer st = new StringTokenizer(c,",");

            while (st.hasMoreTokens())
            {
                //get next token and store it in the array
                numbers[row][col] = st.nextToken();
                System.out.print(numbers[row][col]);
                col++;     
            }
            col = 0;
            System.out.println();
            row++;
        }
    }

Actually I am splitting the csv file value into an array, but the value used with comma should not be splitted. How can i do that?

Even though it is splitting the comma value into different columns, how can I append that splitted value into a single cell and then the remaining value should be automatically shifted into the columns before?

Upvotes: 0

Views: 1275

Answers (3)

Jeronimo Backes
Jeronimo Backes

Reputation: 6289

Csv is way more complicated to process correctly than it looks. You are not going to end up with a reliable solution using regular expressions, not to mention this is going to be extremely slow.

Just use a library dedicated for that, such as uniVocity-parsers and save yourself the headache. Here's a simple example:

CsvParserSettings settings = new CsvParserSettings(); //many options here, check the tutorial.
CsvParser parser = new CsvParser(settings);
List<String[]> allRows = parser.parseAll(new FileReader(file));

Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

Upvotes: 0

Emanuel Seidinger
Emanuel Seidinger

Reputation: 1272

If your numbers with commas are in some kind of quotation marks you should first replace numbers like this "10,000" with 10000 and then do the tokenization. You can do it like this:

line = line.replaceAll("\"\\s*(\\d+),(\\d+)\\s*\"", "$1$2");

Or if you like to preserve the quotation marks:

line = line.replaceAll("(\")\\s*(\\d+),(\\d+)\\s*(\")", "$1$2$3$4");

You need to replace the \" in the code samples above with the appropriate quotation mark character.

If all your numbers in the csv are in quotation marks you can remove all commas in numbers like this:

line = line.replaceAll("(\\d+),(\\d+)", "$1$2");
line = line.replace("\"", "");

This will work if no csv delimiter comma is between two digits. When all your numbers are in quotation marks this is safe to assume. Use the second line if you also want to remove the quotation marks.

Upvotes: 1

OneCricketeer
OneCricketeer

Reputation: 191701

If your column with commas in it is not quoted, then you'll need to extract out each field manually.

I ran into this issue at work the other day and here was my thought process.

  1. You have 4 columns
  2. Say the second column has commas

I formed a regex starting with groupings for each column, so 4 columns separated by comma and potential space would be

(),\s*(),\s*(),\s*()$

Then I filled in the groups with the data I wanted to match

(\d+),\s*(.+),\s*(\d+),\s*(\d+)$

In this case you have normal numbers for every column except the second, and so the second group grabs any character it can find. There may be a better way to write that, and I certainly would like to know, but I was in a time-crunch when I figured it out. And to show it works, a Regex101.

And following the other answer here is the relevant Java code,

line = line.replaceAll("(\\d+),\\s*(.+),\\s*(\\d+),\\s*(\\d+)$", "$1$2$3$4");

Upvotes: 0

Related Questions