user3287992
user3287992

Reputation: 31

how to improve file reading efficiency and its data insertion in java

We have an autosys job running in our production on daily basis. It calls a shell script which in turn calls a java servlet. This servlet reads these files and inserts the data into two different tables and then does some processing. Java version is 1.6 & application server is WAS7 and database is oracel-11g.

We get several issues with this process like it takes time, goes out of memory etc etc. Below are the details of the way we have coded this process. Please let me know if it can be improved.

  1. When we read the file using BufferedReader, do we really get a lot of strings created in the memory as returned by readLine() method of BufferedReader? These files contain 4-5Lacs of line. All the records are separated by newline character. Is there a better way to read files in java to achieve efficiency? I couldnt find any provided the fact that all the record lines in the file are of variable length.

  2. When we insert the data then we are doing a batch process with statement/prepared statement. We are making one batch containing all the records of the file. Does it really matter to break the batch size to have better performance?

  3. If the tables have no indexes defined nor any other constraints and all the columns are VARCHAR type, then which operation will be faster:- inserting a new row or updating an existing row based upon some matching condition?

Upvotes: 3

Views: 186

Answers (1)

Alfred Xiao
Alfred Xiao

Reputation: 1788

  1. Reading the File

    It is fine using BufferedReader. The key thing here is to read a bunch of lines, then process them. After that, read another bunch of lines, and so on. An important implication here is when you process the second bunch of lines, you no longer reference the previous bunch of lines. This way, you ensure you don't retain memory space unnecessarily. If, however, you retain all references to all the lines, you are likely running into memory issues.

    If you do need to reference all the lines, you can either increase your heap size or, if many of the lines are duplicates, use the technique of intern() or something similar to save memory.

  2. Modifying the Table

    Always better to limit the size of a batch to a reasonable count. The larger the size, the more resource constraint you are imposing to the database end and probably your jvm side as well.

  3. Insert or Update

    If you have indexes defined, I would say updating performs better. However, if you don't have indexes, insert should be better. (You have access to the environment, perhaps you can do a test and share the result with us?)

Lastly, you can also consider using multiple threads to work on the part of 'Modifying the table' so as to improve overall performance and efficiency.

Upvotes: 1

Related Questions