Reputation: 856
I have 26 CSV files that I want to grab from the internet on a nightly basis and upload them into a Postgresql table. I have this working using Java, PreparedStatement, and Batch. Despite this, performance is painfully slow. To grab the 6000 or so entries and put them into Postgresql, it's taking 30 minutes. This is my first time doing something like this, so I don't exactly have a reference point as to whether this is fast or slow.
To get the file, I am using this code.
URL grabberUrl = new URL(csvUrl);
URLConnection grabberConn = grabberUrl.openConnection();
BufferedReader grabberReader = new BufferedReader(new InputStreamReader(grabberConn.getInputStream()));
I am then using PreparedStatement to and taking values from the input stream and setting them
con = DriverManager.getConnection(url, user, password);
pst = con.prepareStatement("insert into blah(name, year) values(?, ?)");
pst.setString(1, name);
pst.setString(2, year);
I am then batching up the inserts. I've tried values from 100 to 1000 with no meaningful change to performance.
pst.addBatch();
if (count == 100) {
count = 0;
pst.executeBatch();
}
Has anyone got any suggestions as to what I can do to make things faster?
Upvotes: 3
Views: 1310
Reputation: 48287
Try the following:
PGConnection con = (PGConnection) DriverManager.getConnection(...);
CopyManager copyManager = con.getCopyAPI();
copyManager.copyIn("copy mytable from stdin with (format csv)", grabberReader);
If mytable is heavily indexed, then drop the indexes, load, and recreate the indexes.
Upvotes: 1
Reputation: 3012
If you can access the files from the PostgreSQL server try using the copy statement. See link http://www.postgresql.org/docs/9.3/static/sql-copy.html
Also, if you know the data quality you can temporarily remove any table constraints and drop any index's. You can add the constraints and the index's after loading the data.
Upvotes: 1