Reputation: 263

Java CSVReader skips rows & how to transform csv

I have been researching all day. And it doesn't matter how I code, the result is not what I want it to be.

First things first, I am working with Big Data, therefore, I do not think it is efficient to keep copy and pasting row entries. I'm reading a CSV file, and it is working, it is cutting out everything I tell it to cut out. Everything is fine so far. Now, the only thing that is going wrong, is the fact that (my opinion) Eclipse (Java) cuts out headers/columnnames from the csv file. How to fix this problem?

package data;

import java.io.FileReader;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.List;

import com.opencsv.CSVReader;

public class BelgiumParser {

public static void main(String[] args) {
    // TODO Auto-generated method stub

    //List<String> listBelgium;
    String fileName = "src\\data\\Belgium.csv";


    try{
        List<String> listBelgium = Files.readAllLines(Paths.get(fileName));

        //CSVReader reader = new CSVReader(new FileReader("src\\data\\Belgium.csv"), ',', '"', 1);

        for(String line : listBelgium){

            line = line.replace("\"" , "");
            line = line.replaceAll("T", " ");
            line = line.replaceAll("Z", "");                

            System.out.println(line);

    }}catch(Exception e){
        //System.out.println(e.getMessage());       
        e.printStackTrace();

    }
}

}

Also tried the while loop:

while(line = bufferedReader.readLine()) != null){...}

Yes I tried both bufferedReader and CSVReader. I might have even found the Python solution to this?

headers = next(reader, None)  # returns the headers or `None` if the input is empty

if headers:
    writer.writerow(headers)

Not my code, don't know how to link things. Main questions:

How can I not only make sure that the header are printed (efficient way, I do not want copy/pasted piece of code)?
But also, How can I make the Reader also Write the some of the headers vertically (Transforming)?

Update:

Containing hundreds of rows of data: -No measurement equals null -Measurement equals integer or doubles(?)

What should happen is: - In the time, the T and Z have to go. - T should be a space: " ", and Z just "" - Column B and higher, row 1, should only contain the plantname itself.

Eventually, should be able to put this all in a MySQL DB, in a clear format, such that it can be implemented with a D3.js line chart, in a Java Server Faces (class?)

Upvotes: 1

Answers (4)

lilienfa

Reputation: 263

Finally achieved what I was trying to do:

package code;

import com.opencsv.CSVReader;
import com.opencsv.CSVWriter;

import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;

public class BelgiumParser {

    public static void main(String[] args) throws IOException {

        String fileName = "src/data/Belgium.csv";

        try (CSVReader reader = new CSVReader(new FileReader(fileName), ',', '"', 1)) {
            String[] nextLine;

            while ((nextLine = reader.readNext()) != null) {

                for (String line : nextLine) {

                    line = line.replaceAll("T", " ");
                    line = line.replaceAll("Z", "");
                    line = line.replaceAll("ActualGenerationPerUnit.mean", "");
                    line = line.replaceAll("Plantname:", "");
                    //Escaping curly braces is a must!
                    line = line.replaceAll("\\{", "");
                    line = line.replaceAll("\\}", "");
                    System.out.println(line);

                }


            }
        }
    }}

Still not efficient enough, but does the job..

Upvotes: 0

Jeronimo Backes

Reputation: 6289

If you are dealing with Big Data then I recommend you to get univocity-parsers as it is much faster than anything else. Then try not to load all rows in memory because it's an obvious problem, and stream them instead. Here's a simple example to get you started:

CsvParserSettings settings = new CsvParserSettings();
settings.detectFormatAutomatically(); //you can configure the format manually if you prefer.
 parserSettings.setHeaderExtractionEnabled(true); //you want to get the headers from the input
settings.selectFields("a", "b", "c"); //select just the columns you need.

CsvParser parser = new CsvParser(settings);

File input = Paths.get(fileName).toFile();
parser.beginParsing(input, "UTF-8");

String[] row;
while ((row = parser.parseNext()) != null) {
    //do your stuff here.

    //here are your headers
    String[] headers = parser.getContext().parsedHeaders();
}

Your second question, if I understood it correctly, is that you want to transpose the rows, i.e. have all data of a column associated with a header.

For that, use a ColumnProcessor (this loads all data in memory, I'll show you the alternative later):

ColumnProcessor columnProcessor = new ColumnProcessor();
parserSettings.setProcessor(columnProcessor);

CsvParser parser = new CsvParser(parserSettings);
parser.parse(input, "UTF-8"); //all rows are submitted to the processor created above.

//At the end of the process, you can get your data like this:
Map<String, List<String>> columnValues = new TreeMap<String, List<String>>(columnProcessor.getColumnValuesAsMapOfNames());

If you have too much data, you'll need to perform the transpose operation in batches. Use the BatchedColumnProcessor for that:

BatchedColumnProcessor columnProcessor = new BatchedColumnProcessor(20000 /*runs batches of 20000 rows each*/) {
    @Override
    public void batchProcessed(int rowsInThisBatch) {
        Map<Integer, List<String>> columnsByIndex = getColumnValuesAsMapOfIndexes();

       //process your batch here
    }
};

This should work perfectly. Hope it helps.

Disclaimer: I'm the author of this library, it's open-source and free (Apache V2.0 license)

Upvotes: 2

A.K.Desai

Reputation: 1294

CSVReader reader = new CSVReader(new FileReader("src\\data\\Belgium.csv"), ',', '"', 1);

Last parameter in above code, you are asking the CSVReader to skip line1 while reading the file. Instead make use of the default(zero) so that it reads the Headers also.

CSVReader reader = new CSVReader(new FileReader("src\\data\\Belgium.csv"), ',', '"', CSVReader.DEFAULT_SKIP_LINES);

Regarding the second question, you would have to write a custom logic by reading the lines into either Arrays or Lists which maintains the order, and handle writing with incremental index.

Upvotes: 1

FaceFTW

Reputation: 63

The best way probably to do this is to essentially have it read each value of a column, and then store it into an array. Then write it into a new transformed CSV file that will print the entire array in one row in any order you want.

I can't really give you some psuedocode, because I am not completely familiar with any CSV reader Libraries, but it usually is easy to find one and use the Javadoc to implement it

Upvotes: 0

Java CSVReader skips rows &amp; how to transform csv

Answers (4)

Related Questions

Java CSVReader skips rows & how to transform csv