C++ and pqxx accumulating a transaction

Question

TL;DR - I am accumulating a transaction in postgresql (pqxx) and suspect I'm doing something overly tortuous to get where I'm going.

In the first part of the question I've shown the naive way one might do a series of inserts and explain that it is (first) slow and (second) can lead to an error. Then I show where reading docs has lead me: I'm pretty sure it's an abusive way to interact with libpqxx, but I'm not sure what the intended pattern is.

Intro: the naive approach to doing thousands of inserts

I have some code that wants to write a bunch of stuff to the database, typically INSERT ... ON CONFLICT UPDATE. Roughly, this code is looping over some container and generating the needed SQL to insert each container object that needs inserting.

The naive way to do this (skipping error/exception handling for the moment) is thus:

pqxx::work txn(conn);
ostringstream sql_stmt;
Container rows;
for (const auto& row : rows) {
    if (row.IsDirty()) {
        RowToSQL(sql_stmt, row, txn);
        txn.exec(sql_stmt.str());
        // Clear sql_stmt here.
    }
}
txn.commit();

The function RowToSQL() takes a transaction object so that it can quote strings appropriately using txn.quote().

This is inefficient, however: calling exec() over and over turns out to be quite slow.

So instead I build up a bunch of statements in the ostringstream, thus:

pqxx::work txn(conn);
ostringstream sql_stmt;
Container rows;
for (const auto& row : rows) {
    if (row.IsDirty()) {
        RowToSQL(sql_stmt, row, txn);
        if (++counter > kExecuteThreshold || sql_stmt.IsTooBig()) {
            txn.exec(sql_stmt.str());
            // Clear sql_stmt, reset counter here.
        }
        if (count > kCommitThreshold) {
            txn.commit();
            // Reset statement counter here.
        }
    }
}
// Final commit here.

I chose the two thresholds based on performance testing in our environment, think 100 and 10,000 for order of magnitude.

This worked until it didn't, because re-using the transaction this way leads to conflicts.

Attempt to activate transaction which is already closed.

This SO question addresses this mostly.

This leads me to write something that has weird code smell, and I so suspect I've misunderstood how postgresql / libpqxx intend to be used.

Adding back error/exception handling:

Less naive but weird code smell

pqxx::nontransaction non_txn;
ostringstream sql_stmt;
Container rows;
Vector transactions;
for (const auto& row : rows) {
    if (row.IsDirty()) {
        RowToSQL(sql_stmt, row, non_txn);
        if (++counter > kExecuteThreshold || sql_stmt.IsTooBig()) {
            transactions.push_back(sql_stmt.str());
            // Clear sql_stmt, reset counter here.
        }
        if (count > kCommitThreshold) {
            try {
                pqxx::work txn(conn);
                for (const string& trans : transactions) {
                    txn.exec(trans);
                }
                txn.commit();
                transactions.clear();
                // Reset statement counter here.
            } catch (const exception& e) {
                txn.abort();  // Pedantic, happens on destruction.
                YellowAlert();  // Something appropriate.
            }
        }
    }
}
// Final commit here.

It seems quite wrong to me that I should build up this vector of things to execute and maintain these custom-tuned execution/transaction thresholds rather than using some facility of libpqxx. That is, this pattern seems common enough to me that the complexity I'm starting to see strikes me as my own misunderstanding.

Any pointers much appreciated.

C++ and pqxx accumulating a transaction

Intro: the naive approach to doing thousands of inserts

Less naive but weird code smell

Answers (0)

Related Questions