Spade Johnsson
Spade Johnsson

Reputation: 572

Parse .csv File in java returns outofbounds exception

I have the following issue: I am trying to parse a .csv file in java, and store specifically 3 columns of it in a 2 Dimensional array. The Code for the method looks like this:

    public static void parseFile(String filename) throws IOException{
    FileReader readFile = new FileReader(filename); 
    BufferedReader buffer = new BufferedReader(readFile);
    String line; 
    String[][] result = new String[10000][3];
    String[] b = new String[6];

    for(int i = 0; i<10000; i++){
            while((line = buffer.readLine()) != null){
                b = line.split(";",6);
                System.out.println("ID: "+b[0]+" Title: "+b[3]+ "Description: "+b[4]); // Here is where the outofbounds exception occurs...


                result[i][0] = b[0];
                result[i][1] = b[3];    
                result[i][2] = b[4];
                }
            }
            buffer.close();

}

I feel like I have to specify this: the .csv file is HUGE. It has 32 columns, and (almost) 10.000 entries (!). When Parsing, I keep getting the following:

    XXXXX CHUNKS OF SUCCESFULLY EXTRACTED CODE
    Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException:3
    at ParseCSV.parseFile(ParseCSV.java:24)
    at ParseCSV.main(ParseCSV.java:41)

However, I realized that SOME of the stuff in the file has a strange format e.g. some of the texts inside it for instance have newlines in them, but there is no newline character involved in any way. However, if I delete those blank lines manually, the output generated (before the error message is prompted) adds the stuff to the array up until the next blank line ... Does anyone have an idea how to fix this? Any help would be greately appreciated...

Upvotes: 2

Views: 2263

Answers (4)

Emily Crutcher
Emily Crutcher

Reputation: 658

Your first problem is that you probably have at least one blank line in your csv file. You need to replace:

b = line.split(";", 6);

with

b = line.split(";");
if(b.length() < 5){
   System.err.println("Warning, line has only " + b.length() + 
                      "entries, so skipping it:\n" + line);
   continue;
} 

If your input can legitimately have new lines or embedded semi-colons within your entries, that is a more complex parsing problem, and you are probably better off using a third-party parsing library, as there are several very good ones.

If your input is not supposed to have new lines in it, the problem probably is \r. Windows uses \r\n to represent a new line, while most other systems just use \n. If multiple people/programs edited your text file, it is entirely possible to end up with stray \r by themselves, which are not easily handled by most parsers.

A way to easily check if that's your problem is before you split your line, do

line = line.replace("\r","").

If this is a process you are repeating many times, you might need to consider using a Scanner (or library) instead to get more efficient text processing. Otherwise, you can make do with this.

Upvotes: 2

Mrinal Bhattacharjee
Mrinal Bhattacharjee

Reputation: 1406

Please check b.length>0 before accessing b[].

Upvotes: 0

whpratt
whpratt

Reputation: 1

String's split(pattern, limit) method returns an array sized to the number of tokens found up to the the number specified by the limit parameter. Limit is the maximum, not the minimum number of array elements returned.

"1,2,3" split with (",", 6) with return an array of 3 elements: "1", "2" and "3".

"1,2,3,4,5,6,7" will return 6 elements: "1", "2", "3", "4", "5" and ""6,7" The last element is goofy because the split method stopped splitting after 5 and returned the rest of the source string as the sixth element.

An empty line is represented as an empty string (""). Splitting "" will return an array of 1 element, the empty string.

In your case, the string array created here

String[] b = new String[6];

and assigned to b is replaced by the the array returned by

b = line.split(";",6);

and meets it's ultimate fate at the hands of the garbage collector unseen and unloved.

Worse, in the case of the empty lines, it's replaced by a one element array, so

System.out.println("ID: "+b[0]+" Title: "+b[3]+ "Description: "+b[4]);

blows up when trying to access b[3].

Suggested solution is to either

while((line = buffer.readLine()) != null){
    if (line.length() != 0)
    {
            b = line.split(";",6);
            System.out.println("ID: "+b[0]+" Title: "+b[3]+ "Description: "+b[4]); // Here is where the outofbounds exception occurs...
        ...
    }

or (better because the previous could trip over a malformed line)

while((line = buffer.readLine()) != null){
    b = line.split(";",6);
    if (b.length() == 6)
    {
            System.out.println("ID: "+b[0]+" Title: "+b[3]+ "Description: "+b[4]); // Here is where the outofbounds exception occurs...
        ...
    }

You might also want to think about the for loop around the while. I don't think it's doing you any good.

 while((line = buffer.readLine()) != null)

is going to read every line in the file, so

for(int i = 0; i<10000; i++){
        while((line = buffer.readLine()) != null){

is going to read every line in the file the first time. Then it going to have 9999 attempts to read the file, find nothing new, and exit the while loop.

You are not protected from reading more than 10000 elements because the while loop because the while loop will read a 10001th element and overrun your array if there are more than 10000 lines in the file. Look into replacing the big array with an arraylist or vector as they will size to fit your file.

Upvotes: 0

Zlelik
Zlelik

Reputation: 570

When you have new lines in your CSV file, after this line while((line = buffer.readLine()) != null){ variable line will have not a CSV line but just some text without ;

For example, if you have file

column1;column2;column
3 value

after first iteration variable line will have

column1;column2;column

after second iteration it will have 3 value

when you call "3 value".split(";",6) it will return array with one element. and later when you call b[3] it will throw exception.

CSV format has many small things, to implement which you will spend a lot of time. This is a good article about all possible csv examples http://en.wikipedia.org/wiki/Comma-separated_values#Basic_rules_and_examples

I would recommend to you some ready CSV parsers like this

https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVParser.html

Upvotes: 0

Related Questions