Reputation: 21
My context is PostgreSQL 8.3
I need to speed up this query as both tables have millions of records.
For each row in table Calls, there are two rows in Trunks table. For every call_id, I want to copy value from trunks.trunk to calls.orig_trunk when trunk_id is the lowest trunk_id of the two rows. ...And copy value from trunks.trunk to calls.orig_trunk when trunk_id is the highest trunk_id of the two rows.
initial content of Table Calls:
Call_ID | dialed_number | orig_trunk | dest_trunk
--------|---------------|------------|-----------
1 | 5145551212 | null | null
2 | 8883331212 | null | null
3 | 4164541212 | null | null
Table Trunks:
Call_ID | trunk_id | trunk
--------|----------|-------
1 | 1 | 116
1 | 2 | 9
2 | 3 | 168
2 | 4 | 3
3 | 5 | 124
3 | 6 | 9
final content of Table Calls:
Call_ID | dialed_number | orig_trunk| dest_trunk
--------|---------------|-----------|----------
1 | 5145551212 | 116 | 9
2 | 8883331212 | 168 | 3
3 | 4164541212 | 124 | 9
I have created index for every column.
update calls set orig_trunk = t2.trunk
from ( select call_id,trunk_id from trunks
order by trunk_id ASC ) as t2
where (calls.call_id=t2.call_id );
update calls set dest_trunk = t2.trunk
from ( select call_id,trunk_id from trunks
order by trunk_id DESC ) as t2
where (calls.call_id=t2.call_id );
Any ideas ?
Upvotes: 1
Views: 189
Reputation: 21
This is the final code with test conditions as comments. The subquery is very efficient and rapid. However the test revealed that partitionning the table will have a greater impact on execution time than efficiency of the subquery. On a table of 1 million rows, the update takes 80 seconds. On a table of 12 millions rows, the update takes 580 seconds.
update calls1900 set orig_trunk = a.orig_trunk, dest_trunk = a.dest_trunk
from (select
x.call_id,
t1.trunk as orig_trunk, t2.trunk as dest_trunk
from (select calls1900.call_id
,min(t.trunk_id) as orig_trunk_id
,max(t.trunk_id) as dest_trunk_id
from calls1900
join trunks t on (t.call_id = calls1900.call_id)
-- where calls1900.call_id between 43798930 and 43798950
group by calls1900.call_id
) x
join trunks t1 on (t1.trunk_id = x.orig_trunk_id)
join trunks t2 on (t2.trunk_id = x.dest_trunk_id)
) a
where (calls1900.call_id = a.call_id); -- and (calls1900.call_id between 43798930 and 43798950)<code>
Upvotes: 1
Reputation: 9170
From the example posted, it looks like many unnecessary updates are being performed. Here is an example of a query to get the results you are looking for:
select distinct c.call_id, c.dialed_number
,first_value(t.trunk) over w as orig_trunk
,last_value(t.trunk) over w as dest_trunk
from calls c
join trunks t on (t.call_id = c.call_id)
window w as (partition by c.call_id
order by trunk_id
range between unbounded preceding
and unbounded following
)
There are other ways to do it without the analytic function, for example:
select x.call_id
,x.dialed_number
,t1.trunk as orig_trunk
,t2.trunk as dest_trunk
from (select c.call_id, c.dialed_number
,min(t.trunk_id) as orig_trunk_id
,max(t.trunk_id) as dest_trunk_id
from calls c
join trunks t on (t.call_id = c.call_id)
group by c.call_id, c.dialed_number
) x
join trunks t1 on (t1.trunk_id = x.orig_trunk_id)
join trunks t2 on (t2.trunk_id = x.dest_trunk_id)
Experiment to see what works best in your situation. Probably want to be indexed on the joining columns.
What to do with the result set is dependent on the nature of the application. Is this a one off? Then why not just create a new table from the result set:
CREATE TABLE trunk_summary AS
SELECT ...
Is it constantly changing? Is it frequently accessed? Is it sufficient to just create a view? Or maybe an update is to be performed based on the result set. Maybe a range can be updated at a time. It really depends, but this might give a start.
Upvotes: 0