Denis Munger
Denis Munger

Reputation: 21

update table from another table multiple rows

My context is PostgreSQL 8.3

I need to speed up this query as both tables have millions of records.

For each row in table Calls, there are two rows in Trunks table. For every call_id, I want to copy value from trunks.trunk to calls.orig_trunk when trunk_id is the lowest trunk_id of the two rows. ...And copy value from trunks.trunk to calls.orig_trunk when trunk_id is the highest trunk_id of the two rows.

initial content of Table Calls:

Call_ID | dialed_number | orig_trunk | dest_trunk
--------|---------------|------------|-----------
1       | 5145551212    |    null    |   null
2       | 8883331212    |    null    |   null
3       | 4164541212    |    null    |   null

Table Trunks:

Call_ID | trunk_id | trunk
--------|----------|-------
1       | 1        |  116
1       | 2        |  9
2       | 3        |  168
2       | 4        |  3
3       | 5        |  124
3       | 6        |  9 

final content of Table Calls:

Call_ID | dialed_number | orig_trunk| dest_trunk
--------|---------------|-----------|----------
1       | 5145551212    |    116    |   9
2       | 8883331212    |    168    |   3
3       | 4164541212    |    124    |   9

I have created index for every column.

update calls set orig_trunk = t2.trunk 
from ( select call_id,trunk_id from trunks 
     order by trunk_id ASC ) as t2 
where (calls.call_id=t2.call_id );

update calls set dest_trunk = t2.trunk 
from ( select call_id,trunk_id from trunks 
     order by trunk_id DESC ) as t2 
where (calls.call_id=t2.call_id );

Any ideas ?

Upvotes: 1

Views: 189

Answers (2)

Denis Munger
Denis Munger

Reputation: 21

This is the final code with test conditions as comments. The subquery is very efficient and rapid. However the test revealed that partitionning the table will have a greater impact on execution time than efficiency of the subquery. On a table of 1 million rows, the update takes 80 seconds. On a table of 12 millions rows, the update takes 580 seconds.

update calls1900 set orig_trunk = a.orig_trunk, dest_trunk = a.dest_trunk   
from (select 
  x.call_id,
      t1.trunk as orig_trunk, t2.trunk as dest_trunk 
  from (select calls1900.call_id
              ,min(t.trunk_id) as orig_trunk_id
              ,max(t.trunk_id) as dest_trunk_id
          from calls1900
          join trunks t on (t.call_id = calls1900.call_id)
          -- where calls1900.call_id between 43798930 and 43798950
          group by calls1900.call_id
        ) x
  join trunks t1 on (t1.trunk_id = x.orig_trunk_id)
  join trunks t2 on (t2.trunk_id = x.dest_trunk_id)
  ) a

where (calls1900.call_id = a.call_id); -- and (calls1900.call_id between 43798930 and 43798950)<code> 

Upvotes: 1

Glenn
Glenn

Reputation: 9170

From the example posted, it looks like many unnecessary updates are being performed. Here is an example of a query to get the results you are looking for:

select distinct c.call_id, c.dialed_number
      ,first_value(t.trunk) over w as orig_trunk
      ,last_value(t.trunk)  over w as dest_trunk
  from calls c
  join trunks t on (t.call_id = c.call_id)
  window w as (partition by c.call_id
               order by trunk_id
               range between unbounded preceding
                         and unbounded following
              )

There are other ways to do it without the analytic function, for example:

select x.call_id
      ,x.dialed_number
      ,t1.trunk as orig_trunk
      ,t2.trunk as dest_trunk
  from (select c.call_id, c.dialed_number
              ,min(t.trunk_id) as orig_trunk_id
              ,max(t.trunk_id) as dest_trunk_id
          from calls c
          join trunks t on (t.call_id = c.call_id)
          group by c.call_id, c.dialed_number
        ) x
  join trunks t1 on (t1.trunk_id = x.orig_trunk_id)
  join trunks t2 on (t2.trunk_id = x.dest_trunk_id)

Experiment to see what works best in your situation. Probably want to be indexed on the joining columns.

What to do with the result set is dependent on the nature of the application. Is this a one off? Then why not just create a new table from the result set:

CREATE TABLE trunk_summary AS
  SELECT ...

Is it constantly changing? Is it frequently accessed? Is it sufficient to just create a view? Or maybe an update is to be performed based on the result set. Maybe a range can be updated at a time. It really depends, but this might give a start.

Upvotes: 0

Related Questions