Bill
Bill

Reputation: 1247

Avoid redundant updates

I'm working in an IO bound system (and this isn't going to change). So I'm rewriting some sql to only update when it needs to and it's going really well. I'm seeing about a 70% increase in performance. The only problem is the sql is more bloated which isn't the end of the world, just more code to maintain.

So my question is.. Is there an easier way to get Oracle to only update when it needs to compare to adding a where clause:

update table_name
   set field_one = 'one'
 where field_one != 'one';

Note: The real code is much more complex so adding a 'where' like this can sometimes double the length of the query.

Using 11g

Upvotes: 7

Views: 1571

Answers (12)

dani herrera
dani herrera

Reputation: 51665

Create a view over the table and write your custom instead of trigger

--DDL create your sample table:
create table table_name (
   field_one varchar2(100)
  );

--DDL create the view
create view view_name as 
select * from table_name;    

--DDL creating instead trigger
CREATE OR REPLACE TRIGGER trigger_name
   INSTEAD OF UPDATE ON view_name
   FOR EACH ROW
   BEGIN
     UPDATE table_name
     SET field_one = :NEW.field_one
     where :OLD.field_one != :NEW.field_one
     ;
   END trigger_name;

--DML testing:
update view_name set field_one = 'one'

EDITED, test results

I have tested my approach in a 200K rows scenario with 1/20 factor for updatable rows. Here results:

  • Script time updating table directly: 1.559,05 secs
  • Script time updating through trigger: 1.101,14 secs

Steps to repreduce test:

Creating table, view and trigger:

create table table_name (
   myPK int primary key,
   field_1 varchar2(100),
   field_2 varchar2(100),
   field_3 varchar2(4000)
  );

create view view_name as 
select * from table_name;

CREATE OR REPLACE TRIGGER trigger_name
   INSTEAD OF UPDATE ON view_name
   FOR EACH ROW
   BEGIN
     UPDATE table_name
     SET 
        field_1 = :NEW.field_1,
        field_2 = :NEW.field_2
     where 
        myPK = :OLD.myPK
        AND not ( :OLD.field_1 = :NEW.field_1 and
                  :OLD.field_2 = :NEW.field_2 )
     ;
   END trigger_name;

Inserting dummy data with 1/20 factor for updatable rows:

DECLARE
   x NUMBER := 300000;
BEGIN
   FOR i IN 1..x LOOP
      IF MOD(i,20) = 0 THEN     
         INSERT INTO table_name VALUES (i, 'rare', 'hello', 
                                           dbms_random.string('A', 2000));
      ELSE
         INSERT INTO table_name VALUES (i, 'ordinary', 'bye', 
                                           dbms_random.string('A', 2000) );
      END IF;
   END LOOP;
   COMMIT;
END;

Script for testing performace:

declare
    l_start number;
    l_end number;
    l_diff number;    
    rows2update int;

begin
   rows2update := 100000;
   l_start := dbms_utility.get_time ;
   affectedRows := 0;
   FOR i IN 1..rows2update LOOP

      rows2update := rows2update - 1;
      update view_name    --<---- replace by table_name to test without trigger
         set field_1 = 'ordinary' 
       where myPK = round( dbms_random.value(1,300000) )    ; 
      commit;

   end loop;

   l_end := dbms_utility.get_time ;
   dbms_output.put_line('l_start ='||l_start);
   dbms_output.put_line('l_end ='||l_end);
   l_diff := (l_end-l_start)/100;
   dbms_output.put_line('Elapsed Time: '|| l_diff ||' secs');

end;
/

Disclaimer: This is a simple test made in a virtual environment only as a first approach test. I'm sure than results can change significantly just changing field lenght or other parameters.

Upvotes: 2

Adrian
Adrian

Reputation: 6101

This question reminded me about an old AskTom conversation How to Update millions or records in a table

In a nutshell, it suggest the following solution to improve the performance of large number of updates on a table:

  • CREATE TABLE new_table AS SELECT (do the update "here") FROM old_table
  • INDEX new_table grant on new table
  • ADD CONSTRAINTS ON new_table etc
  • DROP TABLE old_table
  • RENAME new_table TO old_table;

You can do that using parallel query, with "nologging" on most operations generating very little redo and no undo at all -- in a fraction of the time it would take to update the data.

However, I will keep it simple:

Create a temporary table containing the keys and values of the rows you intend to update; filter rows which you can tell no update is actually required because the new value is the same as old value

CREATE GLOBAL TEMPORARY TABLE temp_table_name ON COMMIT DELETE ROWS
AS
SELECT id, 'one' as field_one FROM table_name WHERE field_one != 'one'

Update main table from temporary table

MERGE INTO table_name b
USING 
(
  SELECT id,
         field_one
  FROM temp_table_name
) a ON (a.id = b.id)
WHEN MATCHED THEN UPDATE 
    SET b.field_one = a.field_one

Or

UPDATE a
 SET a.field_one = b.field_one
FROM temp_table_name a
INNER JOIN table_name b
 ON a.ID = b.ID

See also:

Upvotes: 0

Thomas K&#246;hne
Thomas K&#246;hne

Reputation: 151

If your table has a column that in the end should consist of only one value - why do you need the column at all?

If for some reason you need the column, you can drop the column and then recreate it with the default value you want.

Upvotes: 1

Alessandro Rossi
Alessandro Rossi

Reputation: 2450

There shouldn't be an easy way to achieve your objective. But if you have the Enterprise Edition you would have one possibility to do this ...it very difficult anyway: implement Virtual Private Database (VPD) feature through DBMS_RLS on the tables with pour performance updates using a policy that calls a function that works in this way:

  1. Detects the SQL that is actually executing the update through joining some dynamic performance views (V$VPD_POLICY, V$SQLAREA and if they won't be enough also V$OPENCURSOR or V$SQL_MONITOR) for the current USER or current SESSION_ID.
  2. Parses the UPDATE statement to detect the columns used in the SET clause and their assigned values. You could use a parser generator like antlr or a similar tool to build a java parser, once your tests will work fine you can load the JAR on the database to be wrapped with PL/SQL procedure
  3. Returns the additional where condition with the concatenation of old_column_value != new_column_value just detected by the parsing step.

An example flow:

  1. Your application executes an UPDATE:

    UPDATE <table> t
        SET t.<col1> = 5,
            <col2> = :named_bind,
            <col3> = (SELECT o.<col5> FROM <oth_table> o WHERE t.<col4> = o.<col6>)
    
  2. The system activates the VPD policy and invokes the just described function before executing the update

  3. This function after executing the query on the dynamic performance views fetches the current executing statement

  4. Then parses the statement witch will return the 3 pairs (column_name, column_value):

aa

("t.<col1>", "5"),
("<col2>", ":named_bind"),
("<col3>", "(SELECT o.<col5> FROM <oth_table> o WHERE t.<col4> = o.<col6>)")
  1. (5)The function returns the additional where clause:

aa

t.<col1> != 5 
AND <col2> != :named_bind
AND <col3> != (SELECT o.<col5> FROM <oth_table> o WHERE t.<col4> = o.<col6>)
  1. (6)The system will convert the statement into:

aa

UPDATE <table> t
    SET t.<col1> = 5,
        <col2> = :named_bind,
        <col3> = (SELECT o.<col5> FROM <oth_table> o WHERE t.<col4> = o.<col6>)
WHERE t.<col1> != 5 
    AND <col2> != :named_bind
    AND <col3> != (SELECT o.<col5> FROM <oth_table> o WHERE t.<col4> = o.<col6>)

I haven't tested any part of this, the job to do is very long, but it should work fine once it's done in my opinion. One last thing: this method wouldn't work with statements using positional binding variables, with named ones it could be fine but I'm not definitely sure about it. This may also be a very intense job and for processing frequent and (already)fast UPDATES it should be avoided so you'll also need to find a way to exclude those fast statements to be processed.

Upvotes: 0

HLGEM
HLGEM

Reputation: 96570

Given the nature of how SQL works, this is exactly what you need to do. If you tell it:

update table_name    
   set field_one = 'one';

that means something entirely different in SQL than

update table_name
   set field_one = 'one'
 where field_one != 'one';

The database can only process what you told it to process, In the first case, because there is no where clause, you have told it to process all the records.

In the second case you have put a filter on it to process only some specific records.

It is up to the code writer not the database to determine the content of the query. If you didn't want every record updated, you should not have told it to do so. The database is quite literal about the commands you give it. Yes the second set of queries are longer becasue they are being more specific. They havea differnt meaning than the orginal queries. That is all to the good as it is far faster to update the ten records you are interested in than all 1,000,000 records in the table.

You need to get over the idea that longer is somehow a bad thing in database queries. Often it is a good thing as you are being more correct in what you are asking for. Your orginal queries were simply incorrect. And now you have pay the price to fix what was a systemically bad practice.

Upvotes: 6

Kombajn zbożowy
Kombajn zbożowy

Reputation: 10693

Did I understand correctly that the problem here is repeating the value 'one' in the query?

If yes, then you can use the following:

update (select field_one, 'one' new_field_one from table_name)
   set field_one = new_field_one
 where field_one != new_field_one;

Upvotes: 1

mungea05
mungea05

Reputation: 110

I would use a temp table ahead of your update. Find the records that need to be updated, then just update those records, using the temp table, with the values you want.

Upvotes: -1

kinjelom
kinjelom

Reputation: 6450

There is a simple explanation for this behavior:

any update, even with the same value, must fire triggers (both external and internal)

so, without this "idle" updates your system may stop working in the way that was designed...

Further information: http://orainternals.wordpress.com/2010/11/04/does-an-update-statement-modify-the-row-if-the-update-modifies-the-column-to-same-value/

PS. The solution "noneditioning view + INSTEAD OF trigger" solves the access problem, but an INSTEAD OF trigger is always a row-level trigger, so this approach can ruin performance.

Upvotes: 0

Mark J. Bobak
Mark J. Bobak

Reputation: 14403

Looking at your statement:

update table_name
   set field_one = 'one'
 where field_one != 'one';

The problem is the "field_one != 'one'" predicate. Using any conventional indexing scheme, this type of not equal predicate is going to cause a FULL TABLE SCAN, which is (likely) going to cause lots of I/O, which you're trying to avoid, particularly if the table is very large. So, what to do?

If the table is large, and the number of rows that satisfy the predicate is (relatively) small, we may be able to do something with function based indexes, and (since you're on 11g) cleverly hide the FBI behind a virtual column.

I'm thinking about something like this:

create table table_name(field_one varchar2(10),
                        field_one_not_one generated always as (case when field_one = 'one' then null else field_one end));
create index field_one_not_one_indx on table_name(field_one_not_one);

Now, just do:

update table_name
   set field_one = 'one'
 where field_one_not_one is not null;

Now, the update should do a FULL INDEX SCAN, but that's going to be much less I/O than an FTS, particularly if the number of rows that need to be updated is relatively small. The idea is that the FBI will only have the rows that need to be updated. As long as that number of rows is significantly less than the total rows in the table, this should be an effective indexing strategy.

Upvotes: 0

JCalcines
JCalcines

Reputation: 1286

I asume that you are looking for an idea instead of code, so this is my mine.

You could use Dynamic SQL Statements to compose your update queries. Let's take the next query :

UPDATE table_name SET field_one = 'one', set field_two = 'where,=' WHERE id = 1

Instead of call to update statement you should call to a procedure which manage the query. This procedure should split the SET clause, looking for the first SET until WHEREor nothing if there isn't a where in your statement. You should be carefull with what is in your assignments since It could contain a WHERE

field_one = 'one', set field_two = 'where,'

Then replace every , with AND and every = with !=. Take care of the assignments.

field_one <> 'one' AND set field_two <> 'where,'

Reattach it to the query on the WHERE clause with an AND. Maybe there isn't a WHERE clause so, you have to add it.

UPDATE table_name 
SET field_one = 'one', set field_two = 'where,=' 
WHERE id = 1 AND field_one != 'one' AND set field_two != 'where,'

With this way you can you your current queries with no modification, just a litte wrap call update_wraper ('Your query');.

PD: I'm assuming by your commentary that you don't work with NULL values.

Upvotes: 0

anandh04
anandh04

Reputation: 86

Try to use merge statements. it may reduce the running time of the update queries.

the above query can be rewritten like this,

MERGE INTO table_name
USING ( SELECT ROWID from table_name Where field_one != 'one') data_table
ON ( table_name.ROWID = data_table.ROWID)
WHEN MATCHED THEN
UPDATE SET table_name.field_one = 'one';

Upvotes: 1

Bill
Bill

Reputation: 1247

I guess there isn't an easier way....

Upvotes: 4

Related Questions