Reputation: 10139
What I have is a query that grabs a set of data. This query is ran at a certain time. Then, 30 minutes later, I have another query (same syntax) that runs and grabs that same set of data. Finally, I have a third query (which is the query in question) that compares both sets of data. The records it pulls out are ones that agree with: if "FEDVIP_Active
" was FALSE in the first data set and TRUE in the second data set, OR "UniqueID
" didn't exist in the first data set and does in the second data set AND FEDVIP_Active
is TRUE. I'm questioning the performance of the query below that does the comparison. It times out after 30 minutes. Is there anything you can see that I shouldn't be doing in order to be the most efficient to run? The two identical-ish data sets I'm comparing have around a million records each.
First query that grabs the initial set of data:
select Unique_ID, First_Name, FEDVIP_Active, Email_Primary
from Master_Subscribers_Prospects
Second query is exactly the same as the first.
Then, the third query below compares the data:
select
a.FEDVIP_Active,
a.Unique_ID,
a.First_Name,
a.Email_Primary
from
Master_Subscribers_Prospects_1 a
inner join
Master_Subscribers_Prospects_2 b
on 1 = 1
where a.FEDVIP_Active = 1 and b.FEDVIP_Active = 0 or
(b.Unique_ID not in (select Unique_ID from Master_Subscribers_Prospects_1) and b.FEDVIP_Active = 1)
Upvotes: 0
Views: 4925
Reputation: 37472
An inner join on 1 = 1
is a disguised cross join and the number of rows a cross join produces can grow rapidly. It's the product of the number of rows in both relations involved. For performance you want to keep intermediate results as small as possible.
Then instead of IN
EXISTS
is often performing better, when the number of rows of the subquery is large.
But I think you don't need IN
or EXITS
at all.
Assuming unique_id
identifies a record and is not null, you could left join the first table to the second one on common unique_id
s. Then if and only if no record for an unique_id
in the second table exits the unique_id
of the first table in the result of the join is null, so you can check for that.
SELECT b.fedvip_active,
b.unique_id,
b.first_name,
b.email_primary
FROM master_subscribers_prospects_2 b
LEFT JOIN master_subscribers_prospects_1 a
ON b.unique_id = a.unique_id
WHERE a.fedvip_active = 1
AND b.fedvip_active = 0
OR a.unique_id IS NULL
AND b.fedvip_active = 1;
For that query indexes on master_subscribers_prospects_1 (unique_id, fedvip_active)
and master_subscribers_prospects_2 (unique_id, fedvip_active)
might also help to speed things up.
Upvotes: 1
Reputation: 1269953
If I understand correctly, you want all records from the second data set where the corresponding unique id in the first data set is not active (either by not existing or by having the flag set to not active).
I would suggest exists
:
select a.*
from Master_Subscribers_Prospects_1 a
where a.FEDVIP_Active = 1 and
not exists (select 1
from Master_Subscribers_Prospects_2 b
where b.Unique_ID = a.Unique_ID and
b.FEDVIP_Active = 1
);
For performance, you want an index on Master_Subscribers_Prospects_2(Unique_ID, FEDVIP_Active)
.
Upvotes: 1
Reputation: 4870
Doing an inner select in where sats is always bad.
Here is a same version with a left join
, that might work for you.
select
a.FEDVIP_Active,
a.Unique_ID,
a.First_Name,
a.Email_Primary
from
Master_Subscribers_Prospects_1 a
inner join
Master_Subscribers_Prospects_2 b on 1 = 1
left join Master_Subscribers_Prospects_1 sa on sa.Unique_ID = b.Unique_ID
where (a.FEDVIP_Active = 1 and b.FEDVIP_Active = 0) or
(sa.Unique_ID is null and b.FEDVIP_Active = 1)
Upvotes: 0