Improve query performance in 300 million rows table

Question

I need to improve the performance of a query which takes way too long. The query is used in an SSIS data flow in the source task. It is part of a longer query but this is what causes the problem so I narrowed it down as follows:

select xa.*, s.idA 
from tableA s 
inner join tableB xa on s.idA = xa.idB
where xa.type = 'type_A'

TableA = 80 million rows
TableB = 340 million rows

TableA has a non clustered index on idA and TableB a non clustered index on idB. The execution plan uses the index of TableA and performs a full table scan on TableB which takes 98% of the effort. I have tried to create a non clustered index on TableB, on column idB including column Type. I had to kill the create query after an hour because it was suspended with a CXPACKET wait type. Probably it was waiting to read all the data but my SSIS flows can't afford to waste resources for so much time. I then tried to create a clustered index on tableB column type, which I thought would take less time. The plan was to modify the query using a cte like this:

;with tmp as 
(
     select * 
     from tableB 
     where type = 'Type_A'
)
select xa.*, s.idA 
from tableA s 
inner join tmp xa on s.idA = xa.idB

Using this query i will have only 200k rows in the tmp table and i will avoid full table scan to filter tableB utilizing all the available indexes so i think it will reasonably improve the speed

However, i had to kill the create query as it was taking more than 1 hour (suspended again, CXPACKET again). So my question is: Is it possible to speed up the index creation process? Is it possible to estimate the time needed for the index creation?

Improve query performance in 300 million rows table

Answers (1)

Related Questions