cbmeeks
cbmeeks

Reputation: 11420

Why do I keep getting an ArrayIndexOutOfBoundsException with this code?

I'm a Java newbie and can't seem to figure out why this crude, 20 minute app is throwing that exception.

Basically I am parsing a 192MB (yes, 192MB) tab-delimited text file and storing the contents into MongoDB.

package get_alternatenames;

import java.io.BufferedReader;
import java.io.FileReader;

import com.mongodb.Mongo;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.BasicDBObject;
import com.mongodb.DBObject;
import com.mongodb.DBCursor;
import java.util.Set;

/**
 *
 * @author cbmeeks
 */
public class Main {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) throws Exception {
        String alternateNamesFileName = "/Users/cbmeeks/Projects/GetData/geonames/alternateNames.txt";
        String line;

        // MongoDB
        Mongo m = new Mongo("localhost", 27017);
        DB db = m.getDB("mydb");

        // Build AlternateNames
        DBCollection altNames = db.getCollection("alternatenames");
        BufferedReader bReader = new BufferedReader(new FileReader(alternateNamesFileName));

        int isPreferredName = 0;
        int isShortName = 0;
        int lines = 0;

        System.out.println("Starting AlternateNames import...");

        while ((line = bReader.readLine()) != null) {
            String l[] = line.split("\t");
            BasicDBObject altName = new BasicDBObject();
            altName.put("alternateNameId", l[0]);
            altName.put("geonameId", l[1]);
            altName.put("isoLanguage", l[2]);
            altName.put("alternateName", l[3]);

            isPreferredName = 0;
            isShortName = 0;

            try {
                if (l[4] != null) {
                    isPreferredName = Integer.parseInt(l[4]);
                }
            } catch (ArrayIndexOutOfBoundsException ex) {
                isPreferredName = 0;
            } catch (Exception ex) {
                isPreferredName = 0;
            }

            try {
                if (l[5] != null) {
                    isShortName = Integer.parseInt(l[5]);
                }
            } catch (ArrayIndexOutOfBoundsException ex) {
                isShortName = 0;
            } catch (Exception ex) {
                isShortName = 0;
            }

            altName.put("isPreferredName", isPreferredName);
            altName.put("isShortName", isShortName);

            altNames.insert(altName);

            lines++;
        }

        bReader.close();
        System.out.println("Number of lines parsed: " + lines);

        System.out.println("Creating indexes...");
        altNames.createIndex(new BasicDBObject("geonameId", 1));
        altNames.createIndex(new BasicDBObject("isoLanguage", 1));
        altNames.createIndex(new BasicDBObject("alternateName", 1));

    }
}

I know this isn't the most beautiful code in the world. And it actually seems to work until around the end. It successfully imports 5.4 million records and then ends with:

Starting AlternateNames import...
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
Java Result: 1
BUILD SUCCESSFUL (total time: 2 minutes 58 seconds)

I can't seem to find what the problem is. I've tried to search the text file to find a problem but at 192MB, nothing seems to be able to handle it except MacVIM and I can't quite get my head around that program. lol

But I am sure it isn't finishing the file. When I go to the last record imported in the text file (based on the record count in MongoDB) it appears to look fine...but I could be missing something.

Any suggestions?

Thanks.

BTW, kudos to Java for parsing that text file in under 3 minutes...

Upvotes: 0

Views: 1730

Answers (4)

cbmeeks
cbmeeks

Reputation: 11420

Here is my corrected code that works. Thanks all for the tips.

package get_alternatenames;

import java.io.BufferedReader;
import java.io.FileReader;

import com.mongodb.Mongo;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.BasicDBObject;
import com.mongodb.DBObject;
import com.mongodb.DBCursor;
import java.util.Set;

/**
 *
 * @author cbmeeks
 */
public class Main {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) throws Exception {
        String alternateNamesFileName = "/Users/cbmeeks/Projects/GetData/geonames/alternateNames.txt";
        String line;

        // MongoDB
        Mongo m = new Mongo("localhost", 27017);
        DB db = m.getDB("MyDB");

        // Build AlternateNames
        DBCollection altNames = db.getCollection("alternatenames");
        BufferedReader bReader = new BufferedReader(new FileReader(alternateNamesFileName));

        int isPreferredName = 0;
        int isShortName = 0;
        int lines = 0;

        System.out.println("Starting AlternateNames import...");

        while ((line = bReader.readLine()) != null) {
            try {
                String l[] = line.split("\t");
                if (l.length >= 4) {
                    BasicDBObject altName = new BasicDBObject();
                    altName.put("alternateNameId", Integer.parseInt(l[0]));
                    altName.put("geonameId", Integer.parseInt(l[1]));
                    altName.put("isoLanguage", l[2]);
                    altName.put("alternateName", l[3]);

                    isPreferredName = 0;
                    isShortName = 0;

                    if (l.length == 5) {
                        isPreferredName = Integer.parseInt(l[4]);
                    }

                    if (l.length == 6) {
                        isPreferredName = Integer.parseInt(l[4]);
                        isShortName = Integer.parseInt(l[5]);
                    }

                    altName.put("isPreferredName", isPreferredName);
                    altName.put("isShortName", isShortName);

                    altNames.insert(altName);

                    lines++;
                }
            } catch (Exception ex) {
            }

        }

        bReader.close();
        System.out.println("Number of lines parsed: " + lines);

        System.out.println("Creating indexes...");
        altNames.createIndex(new BasicDBObject("geonameId", 1));
        altNames.createIndex(new BasicDBObject("isoLanguage", 1));
        altNames.createIndex(new BasicDBObject("alternateName", 1));

    }
}

Upvotes: 0

Steve B.
Steve B.

Reputation: 57284

This section

while ((line = bReader.readLine()) != null) {
            String l[] = line.split("\t");
            BasicDBObject altName = new BasicDBObject();
            altName.put("alternateNameId", l[0]);
            altName.put("geonameId", l[1]);
            altName.put("isoLanguage", l[2]);
            altName.put("alternateName", l[3]);

is the only section where you're accessing the array elements by index but are not in a try/catch block for ArrayIndexOutOfBounds, so the exception has to be thrown somewhere in here. Therefore it will go boom anywhere you hit a line with less than 4 elements. Wrap the whole thing in a try catch or do as Bala suggests and test for the length of l before entering that part of the code.

I'd want to have some kind of checks around pretty much anywhere you'd be pulling in data from an outside source and you require correct content for things to work properly.

Upvotes: 1

Gabe
Gabe

Reputation: 86718

Since you haven't indicated what line your exception is on, I'm going to use my psychic debugging skills.

My psychic powers are telling me that you have a blank line at the end of your file, and when you go to look for the fields in it, you get an exception because there are no fields on a blank line.

Either look for a blank line, or don't try to look for fields that aren't there.

Upvotes: 0

Bala R
Bala R

Reputation: 108947

Why don't you add an array length check like this

     String l[] = line.split("\t");
     if(l.length == 6 )
     {
         BasicDBObject altName = new BasicDBObject();
         altName.put("alternateNameId", l[0]);
         altName.put("geonameId", l[1]);
         altName.put("isoLanguage", l[2]);
         altName.put("alternateName", l[3]);
             ... 

Upvotes: 2

Related Questions