john
john

Reputation: 707

Out of memory with superCSV java library

Here is the code that counts the number of lines in a file. It works with BufferedReader and is fine. No problem . In total there are over 25,000,000 rows

  BufferedReader br = new BufferedReader(new FileReader("C:\\...test.csv")); 
            int lineNbr = 0; 
            while(br.readLine() != null) { 
                lineNbr++; 
                if (lineNbr%1000000==0) { 
                    System.out.println(lineNbr);
                } 
            } 
  br.close(); 
  System.exit(0); 

Here is a similar code with SuperCSV . It throws out of memory after line 11,000,000

 CsvListReader reader = new CsvListReader(new FileReader("C:\\... test.csv"), CsvPreference.EXCEL_PREFERENCE ); 

             List<String> row = reader.read();
            row = reader.read();
                lineNbr = 0;   
            while (reader.read() != null) { 
                lineNbr++; 
                if (lineNbr%1000000==0) { 
                    System.out.println(lineNbr);
                } 


            }

            reader.close(); 
            System.exit(0); 

What am i doing wrong? How to correctly read a file with SuperCSV ?

Upvotes: 1

Views: 413

Answers (2)

kaliatech
kaliatech

Reputation: 17867

Based on your sample code and quick review of the SuperCSV code, I don't see any reason for an OutOfMemory exception to be thrown. I suspect you did not post all information in your sample, or something else is at play.

You can review the source code for SuperCSV here:

I do not see any state being stored that would cause referenced heap memory to grow in a way that could not be garbage collected.

Another possibility is that your CSV file is corrupt, perhaps missing line breaks at some point. The library makes a readLine call at at least one location.

Upvotes: 4

GhostCat
GhostCat

Reputation: 140437

The major difference: your first example simply reads a row from a file, and discards that.

Your second example not only reads a string - keep in mind that call to read() returns a List<String>! Meaning: the CSV reader library is probably doing its job: it is parsing all your input data. That simply requires much more resources than just reading lines and throwing them away.

So, most likely, the second example creates garbage on such a high rate that the garbage collector isn't deal with it.

Upvotes: 3

Related Questions