Reputation: 2053
I am trying to load a large text file (between 400-800MB) and for the file I want to insert the records into database, however I am running into performance problems and memory issues (not enough heap space). I was wondering if there are better approach from what I am currently doing.
So the text file that I am loading has simple format, it would be something like:
00000 Andy 8920 N UNL ...
00001 Roger 4428 N TRX ...
,,,
Current Approach: read each line, get the fields, and build the query
ArrayList<ArrayList<String>> fields = ArrayList<ArrayList<String>>();
ArrayList<String> data= new ArrayList<String>();
while ((line = br.readLine()) != null) {
if(line.length() >= 6)
data.add(line.substring(0, 6));
if(line.length() >= 24)
data.add(line.substring(6, 15));
if(line.length() >= 30)
data.add(line.substring(15, 20));
if(line.length() >= 48)
data.add(line.substring(20, 25));
...
fields.add(data); //it looks like [[00000, Andy , 8920,..],[00001, Roger, ...]]
} //end read
System.gc();
db.insertValues(input);
DB Code
public void insertValues(ArrayList<ArrayList<String>> data) {
PreparedStatement ps = null;
Connection con = null;
try {
con = getConnection();
ps = con.prepareStatement("Insert into CUST_ACCT "
+ "(CID,NAME,R_NUM,CKM_IND,DATE_1,DATE_2,DATE_3,DATE_4,DATE_5,DATE_6,DATE_7,DATE_8,DATE_9,DATE_10,NUMBER_1,NUMBER_2,NUMBER_3,NUMBER_4,NUMBER_5,NUMBER_6,NUMBER_7,NUMBER_8,NUMBER_9,NUMBER_10,STRING_1,STRING_2,STRING_3,STRING_4,STRING_5,STRING_6,STRING_7,STRING_8,STRING_9,STRING_10,GUID,PARN_GUID,LAST_UPDT_DATE_TIME_STAMP)"
+ " values "
+ "(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,sysdate)");
for(int i=0; i< data.size(); i++) {
ps.setString(1, data.get(i).get(0)); //0
ps.setString(2, data.get(i).get(1)); //1
ps.setString(3, data.get(i).get(2)); //2
ps.setString(4, data.get(i).get(3)); //3
...
ps.addBatch();
}
int[] i = ps.executeBatch();
log.info("total of record inserted: "+i.length);
}
However I am getting _e_rror with Not enough heap space
errors and I have also attempted to build query but then it would insert records one by one which after an hour it would only insert around 20k records out of millions. Is there a better way to load the data?
Upvotes: 0
Views: 136
Reputation: 4987
Don't read the whole file - read 1000 lines, then insert them using a prepared statement and commit the transaction after this. Then read another 1000, ...
Also I think Oracle has a special tool to load data (google SQL*Loader and Data pump).
Upvotes: 1
Reputation: 6836
You load all the file in memory and then try to read all of it line by line, and this lead to performance and memory problem (heap space, etc...)
You could read the file using Scanner
, this way read it line by line without loading into memory.
FileInputStream inputStream = null;
Scanner sc = null;
try {
inputStream = new FileInputStream(path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
// db insert!
}
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}
Otherwise using Apache Commons IO
LineIterator it = FileUtils.lineIterator(theFile, "UTF-8");
try {
while (it.hasNext()) {
String line = it.nextLine();
// do something with line
// db insert
}
} finally {
LineIterator.closeQuietly(it);
}
For enanched performance I suggest you to open connection only one time
// your logic....
Connection con = getConnection();
// reading file logic
while (it.hasNext()) {
String line = it.nextLine();
// do something with line
insertValues(con, line);
// other logic
}
// checking exception etc
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
if (con != null ) {
con.close();
}
}
Summing it up:
Hope you understand... those are simple examples and you need to chenge them based on your needs!
Upvotes: 1
Reputation: 1092
let me see if I correctly understood your needs:
You have a large file and each line from the file, you need to insert into oen ore more tables in database. Did I understand correctly?
If yes, have you try using the "SQL*Loader" tool from Oracle? I didn't test it for such a large file but it may be a sollution. You can call it from you Java app.
Upvotes: -1