Reputation: 11420
I'm a Java newbie and can't seem to figure out why this crude, 20 minute app is throwing that exception.
Basically I am parsing a 192MB (yes, 192MB) tab-delimited text file and storing the contents into MongoDB.
package get_alternatenames;
import java.io.BufferedReader;
import java.io.FileReader;
import com.mongodb.Mongo;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.BasicDBObject;
import com.mongodb.DBObject;
import com.mongodb.DBCursor;
import java.util.Set;
/**
*
* @author cbmeeks
*/
public class Main {
/**
* @param args the command line arguments
*/
public static void main(String[] args) throws Exception {
String alternateNamesFileName = "/Users/cbmeeks/Projects/GetData/geonames/alternateNames.txt";
String line;
// MongoDB
Mongo m = new Mongo("localhost", 27017);
DB db = m.getDB("mydb");
// Build AlternateNames
DBCollection altNames = db.getCollection("alternatenames");
BufferedReader bReader = new BufferedReader(new FileReader(alternateNamesFileName));
int isPreferredName = 0;
int isShortName = 0;
int lines = 0;
System.out.println("Starting AlternateNames import...");
while ((line = bReader.readLine()) != null) {
String l[] = line.split("\t");
BasicDBObject altName = new BasicDBObject();
altName.put("alternateNameId", l[0]);
altName.put("geonameId", l[1]);
altName.put("isoLanguage", l[2]);
altName.put("alternateName", l[3]);
isPreferredName = 0;
isShortName = 0;
try {
if (l[4] != null) {
isPreferredName = Integer.parseInt(l[4]);
}
} catch (ArrayIndexOutOfBoundsException ex) {
isPreferredName = 0;
} catch (Exception ex) {
isPreferredName = 0;
}
try {
if (l[5] != null) {
isShortName = Integer.parseInt(l[5]);
}
} catch (ArrayIndexOutOfBoundsException ex) {
isShortName = 0;
} catch (Exception ex) {
isShortName = 0;
}
altName.put("isPreferredName", isPreferredName);
altName.put("isShortName", isShortName);
altNames.insert(altName);
lines++;
}
bReader.close();
System.out.println("Number of lines parsed: " + lines);
System.out.println("Creating indexes...");
altNames.createIndex(new BasicDBObject("geonameId", 1));
altNames.createIndex(new BasicDBObject("isoLanguage", 1));
altNames.createIndex(new BasicDBObject("alternateName", 1));
}
}
I know this isn't the most beautiful code in the world. And it actually seems to work until around the end. It successfully imports 5.4 million records and then ends with:
Starting AlternateNames import...
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
Java Result: 1
BUILD SUCCESSFUL (total time: 2 minutes 58 seconds)
I can't seem to find what the problem is. I've tried to search the text file to find a problem but at 192MB, nothing seems to be able to handle it except MacVIM and I can't quite get my head around that program. lol
But I am sure it isn't finishing the file. When I go to the last record imported in the text file (based on the record count in MongoDB) it appears to look fine...but I could be missing something.
Any suggestions?
Thanks.
BTW, kudos to Java for parsing that text file in under 3 minutes...
Upvotes: 0
Views: 1730
Reputation: 11420
Here is my corrected code that works. Thanks all for the tips.
package get_alternatenames;
import java.io.BufferedReader;
import java.io.FileReader;
import com.mongodb.Mongo;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.BasicDBObject;
import com.mongodb.DBObject;
import com.mongodb.DBCursor;
import java.util.Set;
/**
*
* @author cbmeeks
*/
public class Main {
/**
* @param args the command line arguments
*/
public static void main(String[] args) throws Exception {
String alternateNamesFileName = "/Users/cbmeeks/Projects/GetData/geonames/alternateNames.txt";
String line;
// MongoDB
Mongo m = new Mongo("localhost", 27017);
DB db = m.getDB("MyDB");
// Build AlternateNames
DBCollection altNames = db.getCollection("alternatenames");
BufferedReader bReader = new BufferedReader(new FileReader(alternateNamesFileName));
int isPreferredName = 0;
int isShortName = 0;
int lines = 0;
System.out.println("Starting AlternateNames import...");
while ((line = bReader.readLine()) != null) {
try {
String l[] = line.split("\t");
if (l.length >= 4) {
BasicDBObject altName = new BasicDBObject();
altName.put("alternateNameId", Integer.parseInt(l[0]));
altName.put("geonameId", Integer.parseInt(l[1]));
altName.put("isoLanguage", l[2]);
altName.put("alternateName", l[3]);
isPreferredName = 0;
isShortName = 0;
if (l.length == 5) {
isPreferredName = Integer.parseInt(l[4]);
}
if (l.length == 6) {
isPreferredName = Integer.parseInt(l[4]);
isShortName = Integer.parseInt(l[5]);
}
altName.put("isPreferredName", isPreferredName);
altName.put("isShortName", isShortName);
altNames.insert(altName);
lines++;
}
} catch (Exception ex) {
}
}
bReader.close();
System.out.println("Number of lines parsed: " + lines);
System.out.println("Creating indexes...");
altNames.createIndex(new BasicDBObject("geonameId", 1));
altNames.createIndex(new BasicDBObject("isoLanguage", 1));
altNames.createIndex(new BasicDBObject("alternateName", 1));
}
}
Upvotes: 0
Reputation: 57284
This section
while ((line = bReader.readLine()) != null) {
String l[] = line.split("\t");
BasicDBObject altName = new BasicDBObject();
altName.put("alternateNameId", l[0]);
altName.put("geonameId", l[1]);
altName.put("isoLanguage", l[2]);
altName.put("alternateName", l[3]);
is the only section where you're accessing the array elements by index but are not in a try/catch block for ArrayIndexOutOfBounds, so the exception has to be thrown somewhere in here. Therefore it will go boom anywhere you hit a line with less than 4 elements. Wrap the whole thing in a try catch or do as Bala suggests and test for the length of l before entering that part of the code.
I'd want to have some kind of checks around pretty much anywhere you'd be pulling in data from an outside source and you require correct content for things to work properly.
Upvotes: 1
Reputation: 86718
Since you haven't indicated what line your exception is on, I'm going to use my psychic debugging skills.
My psychic powers are telling me that you have a blank line at the end of your file, and when you go to look for the fields in it, you get an exception because there are no fields on a blank line.
Either look for a blank line, or don't try to look for fields that aren't there.
Upvotes: 0
Reputation: 108947
Why don't you add an array length check like this
String l[] = line.split("\t");
if(l.length == 6 )
{
BasicDBObject altName = new BasicDBObject();
altName.put("alternateNameId", l[0]);
altName.put("geonameId", l[1]);
altName.put("isoLanguage", l[2]);
altName.put("alternateName", l[3]);
...
Upvotes: 2