Moh-Aw
Moh-Aw

Reputation: 3018

Do not update row in ResultSet if data has changed

we are extracting data from various database types (Oracle, MySQL, SQL-Server, ...). Once it is successfully written to a file we want to mark it as transmitted, so we update a specific column.

Our problem is, that a user has the possibility to change the data in the meantime but might forget to commit. The record is blocked with a select for update statement. So it can happen, that we mark something as transmitted, which is not.

This is an excerpt from our code:

Statement stmt = conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_UPDATABLE);
ResultSet extractedData = stmt.executeQuery(sql);
writeDataToFile(extractedData);
extractedData.beforeFirst();
while (extractedData.next()) {
    if (!extractedData.rowUpdated()) {
        extractedData.updateString("COLUMNNAME", "TRANSMITTED");
        // code will stop here if user has changed data but did not commit
        extractedData.updateRow();
        // once committed the changed data is marked as transmitted
    }
}

The method extractedData.rowUpdated() returns false, because technically the user didn't change anything yet. Is there any way to not update the row and detect if data was changed at this late stage?

Unfortunately I cannot change the program the user is using to change the data.

Upvotes: 14

Views: 2227

Answers (3)

Moh-Aw
Moh-Aw

Reputation: 3018

In the end we had to implement optimistic locking. In some tables we already have a column that stores the version number. Some other tables have a timestamp column that holds the time of the last change (changed by trigger).

While a timestamp might not always be a reliable source for optimistic locking we went with it anyway. Several changes during a single second are not very realistic in our environment.

Since we have to know the primary key without describing it before hand, we had to access the resultset metadata. Some of our databases do not support this (DB/2 legacy tables for example). We are still using the old system for these.

Note: The tableMetaData is an XML-config file where our description of the table is stored. This is not directly related to the metadata of the table in the database.

Statement stmt = conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_UPDATABLE);
ResultSet extractedData = stmt.executeQuery(sql);
writeDataToFile(extractedData);
extractedData.beforeFirst();
while (extractedData.next()) {
    if (tableMetaData.getVersion() != null) {
        markDataAsExported(extractedData, tableMetaData);
    } else {
        markResultSetAsExported(extractedData, tableMetaData);
    }
}

// new way with building of an update statement including the version column in the where clause
private void markDataAsExported(ResultSet extractedData, TableMetaData tableMetaData) throws SQLException {
    ResultSet resultSetPrimaryKeys = null;
    PreparedStatement versionedUpdateStatement = null;
    try {
        ResultSetMetaData extractedMetaData = extractedData.getMetaData();
        resultSetPrimaryKeys = conn.getMetaData().getPrimaryKeys(null, null, tableMetaData.getTable());
        ArrayList<String> primaryKeyList = new ArrayList<String>();
        String sqlStatement = "update " + tableMetaData.getTable() + " set " + tableMetaData.getUpdateColumn()
                + " = ? where ";
        if (resultSetPrimaryKeys.isBeforeFirst()) {
            while (resultSetPrimaryKeys.next()) {
                primaryKeyList.add(resultSetPrimaryKeys.getString(4));
                sqlStatement += resultSetPrimaryKeys.getString(4) + " = ? and ";
            }
            sqlStatement += tableMetaData.getVersionColumn() + " = ?";
            versionedUpdateStatement = conn.prepareStatement(sqlStatement);
            while (extractedData.next()) {
                versionedUpdateStatement.setString(1, tableMetaData.getUpdateValue());
                for (int i = 0; i < primaryKeyList.size(); i++) {
                    versionedUpdateStatement.setObject(i + 2, extractedData.getObject(primaryKeyList.get(i)),
                            extractedMetaData.getColumnType(extractedData.findColumn(primaryKeyList.get(i))));
                }
                versionedUpdateStatement.setObject(primaryKeyList.size() + 2,
                        extractedData.getObject(tableMetaData.getVersionColumn()), tableMetaData.getVersionType());
                if (versionedUpdateStatement.executeUpdate() == 0) {
                    logger.warn(Message.COLLECTOR_DATA_CHANGED, tableMetaData.getTable());
                }
            }
        } else {
            logger.warn(Message.COLLECTOR_PK_ERROR, tableMetaData.getTable());
            markResultSetAsExported(extractedData, tableMetaData);
        }
    } finally {
        if (resultSetPrimaryKeys != null) {
            resultSetPrimaryKeys.close();
        }
        if (versionedUpdateStatement != null) {
            versionedUpdateStatement.close();
        }
    }
}

//the old way as fallback
private void markResultSetAsExported(ResultSet extractedData, TableMetaData tableMetaData) throws SQLException {
    while (extractedData.next()) {
        extractedData.updateString(tableMetaData.getUpdateColumn(), tableMetaData.getUpdateValue());
        extractedData.updateRow();
    }
}

Upvotes: 0

TT.
TT.

Reputation: 16137

"a user [...] might forget to commit" > A user either commits or he doesn't. "Forgetting" to commit is tantamount to a bug in his software.

To work around that you need to either:

  • Start a transaction with isolation level SERIALIZABLE, and within that transaction:
    • Read the data and export it. Data read this way is blocked from being updated.
    • Update the data you processed. Note: don't do that with an updateable ResultSet, do that with an UPDATE statement. That way you don't need an CONCUR_UPDATABLE + TYPE_SCROLL_SENSITIVE which is much slower than a CONCUR_READ_ONLY + TYPE_FORWARD_ONLY.
  • Commit the transaction.

That way the buggy software will be blocked from updating data you are processing.

Another way

  • Start a TRANSACTION at a lower isolation level (default READ COMMITTED) and within that transaction
    • Select the data with proper Table Hints Eg for SQL Server these: TABLOCKX + HOLDLOCK (large datasets), or ROWLOCK + XLOCK + HOLDLOCK (small datasets), or PAGLOCK + XLOCK + HOLDLOCK. Having HOLDLOCK as a table hint is practically equivalent to having a SERIALIZABLE transaction. Note that lock escalation may escalate the latter two to table locks if the number of locks becomes too high.
    • Update the data you processed; Note: use an UPDATE statement. Lose the updatable/scroll_sensitive resultset.
  • Commit the TRANSACTION.

Same deal, the buggy software will be blocked from updating data you are processing.

Upvotes: 0

Jan
Jan

Reputation: 13858

So you want to

  • Run through all rows of the table that have not been exported
  • Export this data somewhere
  • Mark these rows exported so your next iteration will not export them again
  • As there might be pending changes on a row, you don't want to mess with that information

How about:

You iterate over all rows. 

for every row 
   generate a hash value for the contents of the row
   compare column "UPDATE_STATUS" with calulated hash
   if no match
     export row
     store hash into "UPDATE_STATUS" 
      if store fails (row locked) 
         -> no worries, will be exported again next time
      if store succeeds (on data already changed by user) 
         -> no worries, will be exported again as hash will not match

This might further slow your export as you'll have to iterate over everything instead of over everything WHERE UPDATE_STATUS IS NULL but you might be able to do two jobs - one (fast) iterating over WHERE UPDATE_STATUS IS NULL and one slow and thorough WHERE UPDATE_STATUS IS NOT NULL (with the hash-rechecking in place)

If you want to avoid store-failures/waits, you might want to store the hash /updated information into a second table copying the primary key plus the hash field value - that way user locks on the main table would not interfere with your updates at all (as those would be on another table)

Upvotes: 1

Related Questions