Reputation: 157
I need to query a database and copy the resultset into another database, which has the same database structure and is also on the same host system.
The following JAVA-function works pretty well (fast and without errors), if the query result is pretty small:
public void copyTableData(Connection dbConnOnSrcDB, Connection dbConnOnDestDB,
String sqlQueryOnSrcDB, String tableNameOnDestDB)
throws SQLException {
try (
PreparedStatement prepSqlStatmOnSrcDB = dbConnOnSrcDB.prepareStatement(sqlQueryOnSrcDB);
ResultSet sqlResultsFromSrcDB = prepSqlStatmOnSrcDB.executeQuery()
) {
ResultSetMetaData sqlMetaResults = sqlResultsFromSrcDB.getMetaData();
// Stores the query results
List<String> columnsOfQuery = new ArrayList<>();
// Store query results
for (int i = 1; i <= sqlMetaResults.getColumnCount(); i++)
columnsOfQuery.add(sqlMetaResults.getColumnName(i));
try (
PreparedStatement prepSqlStatmOnDestDB = dbConnOnDestDB.prepareStatement(
"INSERT INTO " + tableNameOnDestDB +
" (" + columnsOfQuery.stream().collect(Collectors.joining(", ")) + ") " +
"VALUES (" + columnsOfQuery.stream().map(c -> "?").collect(Collectors.joining(", ")) + ")")
) {
while (sqlResultsFromSrcDB.next()) {
for (int i = 1; i <= sqlMetaResults.getColumnCount(); i++)
prepSqlStatmOnDestDB.setObject(i, sqlResultsFromSrcDB.getObject(i));
prepSqlStatmOnDestDB.addBatch();
}
prepSqlStatmOnDestDB.executeBatch();
}
}
}
But I have very large database queries and resultsets in the range of several hundred megabytes.
Problem A: I found out that the below OutOfMemoryError is raising, when the second line of code is processed:
ResultSet sqlResultsFromSrcDB = prepSqlStatmOnSrcDB.executeQuery()
JAVA-Exeption:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.Class.getDeclaredFields0(Native Method)
at java.lang.Class.privateGetDeclaredFields(Class.java:2583)
at java.lang.Class.getDeclaredField(Class.java:2068)
at java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl$1.run(AtomicReferenceFieldUpdater.java:323)
at java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl$1.run(AtomicReferenceFieldUpdater.java:321)
at java.security.AccessController.doPrivileged(Native Method)
at java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl.<init>(AtomicReferenceFieldUpdater.java:320)
at java.util.concurrent.atomic.AtomicReferenceFieldUpdater.newUpdater(AtomicReferenceFieldUpdater.java:110)
at java.sql.SQLException.<clinit>(SQLException.java:372)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2156)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:300)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:169)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:117)
at Application.copyTableData(Application.java:159)
at Application.main(Application.java:585)
Problem B: The copy job needs really much time. Is there a way to speed up the copy process?
The DB queries are:
String[] tables = new String[]{
"table1",
"table1_properties",
"table1_addresses",
"table2",
"table3",
"table4",
"table5",
"table6",
"table7",
"table8",
"table9",
"table10"
};
Function call:
for( String table : tables ){
getDataFromSrcDB = "SELECT " + table + ".* " +
"FROM table1 " +
"FULL JOIN table1_properties " +
"ON table1_properties.d_id=table1.d_id " +
"FULL JOIN table1_addresses " +
"ON table1_addresses.d_id=table1_properties.d_id " +
"FULL JOIN table2 " +
"ON table2.p_id=table1_properties.p_id " +
"FULL JOIN table3 " +
"ON table3.d_id=table1.d_id " +
"FULL JOIN table4 " +
"ON table4.d_id=table1.d_id " +
"FULL JOIN table5 " +
"ON table5.d_id=table1.d_id " +
"FULL JOIN table6 " +
"ON table6.d_id=table1.d_id " +
"FULL JOIN table7 " +
"ON table7.d_id=table1.d_id " +
"FULL JOIN table8 " +
"ON table8.id=table4.id " +
"FULL JOIN table9 " +
"ON table9.d_id=table1.d_id " +
"FULL JOIN table10 " +
"ON table10.a_id=table1_addresses.a_id " +
"WHERE ST_Intersects(ST_MakeEnvelope(" +
minLong + "," +
minLat + "," +
maxLong + "," +
maxLat + ",4326), geom :: GEOMETRY) OR " +
"ST_Intersects(ST_MakeEnvelope(" +
minLong + "," +
minLat + "," +
maxLong + "," +
maxLat + ",4326), CAST(table3.location AS GEOMETRY))";
copyTableData(dbConnOnSrcDB, dbConnOnDestDB, getDataFromSrcDB, table);
}
Upvotes: 2
Views: 494
Reputation: 1116
When you fetch data directly it is fillin ram until all datas fetched. So you can easly see OutOfMemoryError
error.
If you fetch data with stream
you can capture unlimited data, because of stream
fetching, and processing and contiue with new datas, clearing ram for processed datas (as parted with fetchSize
)
Upvotes: 0
Reputation: 59950
When the size of the batch is huge, you get this error :
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
I have some solution.
Instead you can divide the batch for small batches for example each 1_000 persist the data, you need some configuration also, as Mark Rotteveel mention in the comment, and as the documentation mention Getting results based on a cursor :
By default the driver collects all the results for the query at once. This can be inconvenient for large data sets so the JDBC driver provides a means of basing a ResultSet on a database cursor and only fetching a small number of rows.
So what you should to do :
in this case your code can be like this :
//Note here you set auto commit for the source connection
dbConnOnSrcDB.setAutoCommit(false);
final int batchSize = 1000;
final int fetchSize = 50;
int count = 0;
...
//Set the appropriate size for the FetchSize
sqlResultsFromSrcDB.setFetchSize(fetchSize);
while (sqlResultsFromSrcDB.next()) {
for (int i = 1; i <= sqlMetaResults.getColumnCount(); i++) {
prepSqlStatmOnDestDB.setObject(i, sqlResultsFromSrcDB.getObject(i));
}
prepSqlStatmOnDestDB.addBatch();
if (++count % batchSize == 0) {
prepSqlStatmOnDestDB.executeBatch();
}
}
prepSqlStatmOnDestDB.executeBatch(); // insert remaining records
Because you are using PostgreSQL I would like to use dblink to transfer data between database to another database.
Some usefull links :
Upvotes: 4
Reputation: 886
You have many ways to achieve it. Here are few options you can apply-
Read Data from First database and write it into csv file and then read again from csv file in chunks and write to another database.(Easy to implement but more coding) https://gauravmutreja.wordpress.com/2011/10/13/exporting-your-database-to-csv-file-in-java/
If you don't have much data manipulation before data transfer to another database, you can write a simple DB function to read data from one Database and write into another one.
Upvotes: 0