Reputation: 4132
I am now using batch:
String query = "INSERT INTO table (id, name, value) VALUES (?, ?, ?)";
PreparedStatement ps = connection.prepareStatement(query);
for (Record record : records) {
ps.setInt(1, record.id);
ps.setString(2, record.name);
ps.setInt(3, record.value);
ps.addBatch();
}
ps.executeBatch();
I am just wondering if the above code is equivalent to the following code. If not, which is faster?
String query = "INSERT INTO table (id, name, value) VALUES ";
for (Record record : records) {
query += "(" + record.id + ",'" + record.name + "'," + record.value + "),";
}
query = query.substring(1, query.length() - 1);
PreparedStatement ps = connection.prepareStatement(query);
ps.executeUpdate();
Upvotes: 41
Views: 76898
Reputation: 736
I found this question while trying to upload a large csv to my remote database. Based on what I experienced, the two are not the same. The first method, with batching is the safer solution, but I also found it to be much slower.
I recognize the that this is not the way that is recommended for implementing something that accepts user inputs, but for my use case, it was the first tractable solution that I found.
My use case was this: I had a civ with 21 million rows that I wanted to upload to my database, and I found that using prepared statement batches was much slower. As far as I can tell, this is because each insert statement is added separately to the batch, and so having batch of 1000 executes 1000 insert statements. Whatever the case, this was taking about 30 seconds per batch for me with batch sizes of 1000. With 21 million rows, that would have taken two days. Therefore, I deliberately did something UNSAFE that worked much faster.
When I did it this way, each batch of 10,000 took about 0.25 seconds. This ran enough (1000x) faster than the other way that I thought I would share it for people that are looking for the same answer I was.
For reference, the CSV file that I was using was downloaded from https://fdc.nal.usda.gov/download-datasets.html.
The readCSV function was from Jeronimo Backes's solution here: Slow CSV row parsing and splitting. https://mvnrepository.com/artifact/com.univocity/univocity-parsers
Again, don't do something like this for a solution where you are worried about injection attacks. I used this for a really fast way of reading a csv and putting it into a mysql database.
public static void writeFileToCSV(String filename) {
PreparedStatement stmt = null;
Connection con = null;
Integer insertCount = 0;
Integer batchSize = 10000;
Integer numBatchesSent = 0;
IterableResult<String[], ParsingContext> rows = readCSV(filename);
try {
con = mySqlDatabase.getConnection();
con.setAutoCommit(true); //I found that this did not significantly change the speed of the queries because there was not one query per row.
Integer numInBatch = 0;
ArrayList<Object> values = new ArrayList<>();
String sqlStart = "INSERT IGNORE INTO food_nutrient(id, fdc_id, nutrient_id, amount) VALUES ";
String sqlEnd = " ;";
StringBuilder sqlBuilder = new StringBuilder();
StringBuilder valuesBuilder = new StringBuilder();
Integer lineNum = 0;
//This is my manual parsing of the csv columns. There may be a slicker way to do this
for (String[] nextLine : rows) {
if (lineNum == 0) { //This ignores the header row of the csv.
lineNum++;
continue;
}
Integer id = Integer.parseInt(nextLine[0]);
Integer fdc_id = Integer.parseInt(nextLine[1]);
Integer nutrient_id = Integer.parseInt(nextLine[2]);
Double amount = parseDouble(nextLine[3]);
if (valuesBuilder.length() > 0) {
valuesBuilder.append(',');
}
valuesBuilder.append("(");
valuesBuilder.append(id);
valuesBuilder.append(',');
if (fdc_id != null) {
valuesBuilder.append(fdc_id + ',');
valuesBuilder.append(',');
} else {
valuesBuilder.append("NULL,");
}
if (nutrient_id != null) {
valuesBuilder.append(nutrient_id);
valuesBuilder.append(',');
} else {
valuesBuilder.append("NULL,");
}
if (amount != null) {
valuesBuilder.append(amount);
valuesBuilder.append(')');
} else {
valuesBuilder.append("NULL)");
}
if (++insertCount % batchSize == 0) {
sqlBuilder.append(sqlStart);
sqlBuilder.append(valuesBuilder);
sqlBuilder.append(sqlEnd);
stmt = con.prepareStatement(sqlBuilder.toString());
sqlBuilder = new StringBuilder();
valuesBuilder = new StringBuilder();
stmt.executeUpdate();
numBatchesSent++;
System.out.println("Sent batch "+ numBatchesSent + " with " + batchSize + " new rows.");
} //: send the batch
}//: For each row in the csv
//Send the values that were in the last batch.
sqlBuilder.append(sqlStart);
sqlBuilder.append(valuesBuilder);
sqlBuilder.append(sqlEnd);
stmt = con.prepareStatement(sqlBuilder.toString());
sqlBuilder = new StringBuilder();
valuesBuilder = new StringBuilder();
stmt.executeUpdate();
} catch (SQLException ex) {
System.out.println(ex.getMessage());
ex.printStackTrace();
} catch (DataAccessException ex) {
ex.printStackTrace();
}
finally {
try {
System.out.println(stmt);
stmt.executeBatch();
} catch (Exception ex) {
ex.printStackTrace(); //Push the remainder through to the database.
}
try {
if (stmt != null) {
stmt.close();
}
if (con != null) {
con.close();
}
rows.getContext().stop();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
public static IterableResult<String[], ParsingContext> readCSV(String filePath) {
File file = new File(filePath);
//configure the parser here. By default all values are trimmed
CsvParserSettings parserSettings = new CsvParserSettings();
//create the parser
CsvParser parser = new CsvParser(parserSettings);
//create an iterable over rows. This will not load everything into memory.
IterableResult<String[], ParsingContext> rows = parser.iterate(file);
return rows;
}
//This just takes care of NumberFormatExceptions that I had been getting.
private static Double parseDouble(String str) {
try {
Double value=Double.parseDouble(str);
return value;
} catch (NumberFormatException ex) {
System.out.println("There was probably a null value");
return null;
}
}
Upvotes: 0
Reputation: 31
In case that your records size is less than or equal to 1000 the following code is better than both of your codes:
StringBuilder query = new StringBuilder("INSERT INTO table (id, name, value) VALUES ");
if (records.size() <= 1000) {
for (int i = 0; i < records.size(); i++)
query.append("(?, ?, ?), ");
query = new StringBuilder(query.substring(1, query.length() - 1));
PreparedStatement ps = connection.prepareStatement(query.toString());
for (int i = 0; i < records.size(); i++) {
ps.setInt((i * 3) + 1, record.id);
ps.setString((i * 3) + 2, record.name);
ps.setInt((i * 3) + 3, record.value);
}
ps.executeUpdate();
}
In this way you are using PreparedStatement and create it dynamically depending the size of your record list using multiple values clause in one insert query
Upvotes: 2
Reputation: 159774
executeBatch
will have an improved performance over executeUpdate
as long as autocommit is set to false:
connection.setAutoCommit(false);
PreparedStatement ps = connection.prepareStatement(query);
for (Record record : records) {
// etc.
ps.addBatch();
}
ps.executeBatch();
connection.commit();
Upvotes: 46
Reputation: 4090
You can face a serious performance issue if the number of items that you want to insert is large. Therefore, it is safer to define a batch size, and constantly execute the query when the batch size is reached.
Something like the following example code should work. For the full story of how using this code efficiently, please see this link.
private static void insertList2DB(List<String> list) {
final int batchSize = 1000; //Batch size is important.
Connection conn = getConnection();
PreparedStatement ps = null;
try {
String sql = "INSERT INTO theTable (aColumn) VALUES (?)";
ps = conn.prepareStatement(sql);
int insertCount=0;
for (String item : list) {
ps.setString(1, item);
ps.addBatch();
if (++insertCount % batchSize == 0) {
ps.executeBatch();
}
}
ps.executeBatch();
} catch (SQLException e) {
e.printStackTrace();
System.exit(1);
}
finally {
try {
ps.close();
conn.close();
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Upvotes: 5
Reputation: 29
public void insertBatch(final List<Record > records ) {
String query = "INSERT INTO table (id, name, value) VALUES (?, ?, ?)";
GenericDAO.getJdbcTemplate().batchUpdate(sql, new BatchPreparedStatementSetter() {
@Override
public void setValues(PreparedStatement ps, int i) throws SQLException {
Record record = records .get(i);
ps.setInt(1, record.id);
ps.setString(2, record.name);
ps.setInt(3, record.value);
}
@Override
public int getBatchSize() {
return records.size();
}
});
}
Upvotes: 1
Reputation: 400
i think this will do
String query = "INSERT INTO table (id, name, value) VALUES ";
for (Record record : records)
{
query += "(" + record.id + ",'" + record.name + "'," + record.value + "),";
query = query.substring(1, query.length() - 1);
PreparedStatement ps = connection.prepareStatement(query);
ps.executeUpdate();
}
because you have to execute the query for each records to insert into database.
Upvotes: -7
Reputation: 2540
First of all, with query string concatenation you not only lose the type conversion native to PreparedStatement methods, but you also get vulnerable to malicious code being executed in the database.
Second, PreparedStatements are previously cached in the very database itself, and this already gives a very good performance improvement over plain Statements.
Upvotes: 32