logger
logger

Reputation: 2053

Building dynamic query for large files

I am trying to load a large text file (between 400-800MB) and for the file I want to insert the records into database, however I am running into performance problems and memory issues (not enough heap space). I was wondering if there are better approach from what I am currently doing.

So the text file that I am loading has simple format, it would be something like:

00000  Andy   8920  N  UNL  ...
00001  Roger  4428  N  TRX  ...
,,, 

Current Approach: read each line, get the fields, and build the query

ArrayList<ArrayList<String>> fields = ArrayList<ArrayList<String>>();
ArrayList<String> data= new ArrayList<String>();
while ((line = br.readLine()) != null) {
    if(line.length() >= 6)
        data.add(line.substring(0, 6)); 
    if(line.length() >= 24)
        data.add(line.substring(6, 15));  
    if(line.length() >= 30)
        data.add(line.substring(15, 20)); 
    if(line.length() >= 48)
        data.add(line.substring(20, 25));
...
    fields.add(data); //it looks like [[00000, Andy   , 8920,..],[00001, Roger, ...]]
} //end read
System.gc();
db.insertValues(input);

DB Code

public void insertValues(ArrayList<ArrayList<String>> data) {
        PreparedStatement ps = null;
        Connection con = null;
        try {
            con = getConnection();
            ps = con.prepareStatement("Insert into CUST_ACCT "
                    + "(CID,NAME,R_NUM,CKM_IND,DATE_1,DATE_2,DATE_3,DATE_4,DATE_5,DATE_6,DATE_7,DATE_8,DATE_9,DATE_10,NUMBER_1,NUMBER_2,NUMBER_3,NUMBER_4,NUMBER_5,NUMBER_6,NUMBER_7,NUMBER_8,NUMBER_9,NUMBER_10,STRING_1,STRING_2,STRING_3,STRING_4,STRING_5,STRING_6,STRING_7,STRING_8,STRING_9,STRING_10,GUID,PARN_GUID,LAST_UPDT_DATE_TIME_STAMP)"
                    + " values "
                    + "(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,sysdate)");
for(int i=0; i< data.size(); i++) {
                ps.setString(1, data.get(i).get(0)); //0
                ps.setString(2, data.get(i).get(1)); //1
                ps.setString(3, data.get(i).get(2)); //2
                ps.setString(4, data.get(i).get(3)); //3
        ...
        ps.addBatch();
        }
        int[] i = ps.executeBatch();
        log.info("total of record inserted: "+i.length);
    }

However I am getting _e_rror with Not enough heap space errors and I have also attempted to build query but then it would insert records one by one which after an hour it would only insert around 20k records out of millions. Is there a better way to load the data?

Upvotes: 0

Views: 136

Answers (3)

Admit
Admit

Reputation: 4987

Don't read the whole file - read 1000 lines, then insert them using a prepared statement and commit the transaction after this. Then read another 1000, ...

Also I think Oracle has a special tool to load data (google SQL*Loader and Data pump).

Upvotes: 1

Marcx
Marcx

Reputation: 6836

You load all the file in memory and then try to read all of it line by line, and this lead to performance and memory problem (heap space, etc...)

You could read the file using Scanner, this way read it line by line without loading into memory.

FileInputStream inputStream = null;
Scanner sc = null;
try {
    inputStream = new FileInputStream(path);
    sc = new Scanner(inputStream, "UTF-8");
    while (sc.hasNextLine()) {
        String line = sc.nextLine();
        // db insert!
    }
    if (sc.ioException() != null) {
        throw sc.ioException();
    }
} finally {
    if (inputStream != null) {
        inputStream.close();
    }
    if (sc != null) {
        sc.close();
    }
}

Otherwise using Apache Commons IO

LineIterator it = FileUtils.lineIterator(theFile, "UTF-8");
try {
    while (it.hasNext()) {
        String line = it.nextLine();
        // do something with line
        // db insert
    }
} finally {
    LineIterator.closeQuietly(it);
}

For enanched performance I suggest you to open connection only one time

   // your logic....
   Connection con = getConnection();
   // reading file logic
   while (it.hasNext()) {
        String line = it.nextLine();
        // do something with line
        insertValues(con, line);
        // other logic
   }
   // checking exception etc
   } finally {
        if (inputStream != null) {
            inputStream.close();
        }
        if (sc != null) {
            sc.close();
        }

        if (con != null ) {
            con.close();
        }

    }

Summing it up:

  1. Read file line by line without loading into memory
  2. Open connection only one time (or few times and not for every insert).
  3. Pass connection object to your insert method
  4. close everything when done.

Hope you understand... those are simple examples and you need to chenge them based on your needs!

Upvotes: 1

mikcutu
mikcutu

Reputation: 1092

let me see if I correctly understood your needs:

You have a large file and each line from the file, you need to insert into oen ore more tables in database. Did I understand correctly?

If yes, have you try using the "SQL*Loader" tool from Oracle? I didn't test it for such a large file but it may be a sollution. You can call it from you Java app.

Upvotes: -1

Related Questions