Reputation: 195
I am working on a java program that parses files to lists and then inserts data into a DB. This runs on a server with a ton of memory. Are there java limitations I need to be aware of?
Like such that I shouldn't parse, for instance, a GB of data into a list before inserting it into the DB?
Upvotes: 0
Views: 281
Reputation: 533530
The limits you might need to be aware of are
Tons of memory is 256 - 512 GB these days and I would suggest using off heap memory if you need more than 32 GB in one JVM (or Zing).
Upvotes: 0
Reputation: 96395
You have more limitations than just Java to worry about.
There's network bandwidth usage, hogging your database server CPU, filling up the database transaction log, JDBC performance for mass inserts, slowness while the database updates its indexes or generates artificial keys.
If your inputs get too huge you need to split them into chunks and commit the chunks separately. How big is too big depends on your database.
The way your your artificial keys get allocated can slow the process down, you may need to create batches of values ahead of time, such as by using a hilo generator.
Kicking off a bunch of threads and hammering the database server with them would just cause contention and make the database server work harder, as it has to sort out the transactions and make sure they don't interfere with each other.
Consider writing to some kind of delimited file, then run a bulk-insert utility to load its contents into the database. That way the database actually cooperates, it can suspend updating indexes and checking constraints, and sequences and transactions aren't an issue. It is orders of magnitude faster than JDBC.
Upvotes: 1
Reputation: 5021
Nathans answer is decent - so I'll only add a few bits here...
If you are not doing anything anything terribly sophisticated in your program, then it's might be good practice to write in streaming fashion - in simple terms, read in the input a line at a time and then directly output this to a file, finally calling the database's specific (most of them have one) bulk upload tool.
Reading in all the lines into memory, and then calling insert() over the loop would be pretty inefficient.
you don't give us many clues about why you are reading in this data all in one go - is there a reason for needing to do this?
Upvotes: 1
Reputation: 527
Not directly, but you may want to tweak the JVM arguments a bit.
What are the Xms and Xmx parameters when starting JVMs? might be a useful reference.
Upvotes: 0