Reputation: 513
I have a table with some 50 column count. Every time there is change in row I do not have a knowledge that which columns will change. I don't want to deal with each and every permutation and combination when updating the table.
So when I have to do so I am updating all 50 columns and which, I know, takes much more time than my expectation when dealing with huge number of updates.
UPDATE TBLX SET NAME = ? WHERE ID = ?;
Result of Explain Update...
UPDATE
INDEX SCAN of "TBLX" using "TBLX_ID"
scan matches for (U_ID = ?5), filter by (column#0 = ?6)
UPDATE TBLX SET NAME = CASE WHEN (? != '####') THEN ? ELSE NAME END WHERE ID = ?;
Result of Explain Update...
UPDATE
INDEX SCAN of "TBLX" using "TBLX_ID"
scan matches for (U_ID = ?3), filter by (column#0 = ?4)
So my question is about the internal of the query execution. How both type of the query will be treated and which one will work faster.
Thing I want to understand is whether executor will ignore the part of the query where I am not changing value in column. i.e. assign same value to the column.
Upvotes: 2
Views: 154
Reputation: 1482
The plans show that both queries are using a match on the TBLX_ID index, which is the fastest way to find the particular row or rows to be updated. If it is a single row, this should be quite fast.
The difference between these two queries is essentially what it is doing for the update work once it has found the row. While the plan doesn't show the steps it will take when updating one row, it should be fast either way. At that point, it's native C++ code updating a row in memory that it has exclusive access to. If I had to guess, the one using the CASE clause may take slightly longer, but it could be a negligible difference. You'd have to run a benchmark to measure the difference in execution times to be certain, but I would expect it to be fast in both cases.
What would be more significant than the difference between these two updates is how you handle updating multiple columns. For example, the cost of finding the affected row may be higher than the logic of the actual updates to the columns. Or, at least if you desiged it so that in order to update n columns you have to queue n SQL statements, then the engine has to execute n statements, and use the same index to find the same row n times. All of that overhead would be much more signficant. If instead you had a complex UPDATE statement with many parameters where you could pass in different values to update various columns or set them to their current value, but in all of that the engine only has to execute one statement and find the row once, then even though that seems complex, it would probably be faster. Faster still may be to simply update all of the columns to the new values, whether it is the same as the current value or not.
If you can test this and run a few hundred examples, then the "exec @Statistices PROCEDUREDETAIL 0;" output will show the average execution time for each of the SQL statements as well as the entire procedure. That should provide the metrics necessary to find the optimum way to do this.
Upvotes: 1