Reputation: 150
I am dealing with a SQL query that requires many self joining tables to create aggregate functions such as the following.
SELECT
tab1.foo,
tab1.bar,
tab1.baz,
tab1.bam/tab2.bam as bam_ratio,
RANK() OVER (PARTITION BY tab1.bar, tab1.baz ORDER BY tab1.foo DESC) RANK
FROM
(SELECT
foo,
bar,
baz,
bam
FROM
OPENDATASOURCE(server).dbo.table1
WHERE
qux = 1 AND quux = 'A' AND corge = 2) tab1
JOIN
(SELECT
foo,
bar,
baz,
bam
FROM
OPENDATASOURCE(server).dbo.table1
WHERE
qux = 1 AND quux = 'B' AND corge = 2) tab2 ON tab1.bar = tab2.bar
AND tab1.baz = tab2.baz
AND tab1.foo = tab2.foo
The issue being each of these tables have few columns but tens of millions of records. This has created a unique case where indexing does not adequately improve performance (The query above uses every row in table1).
What I have noticed is these table calls tend to only change in one or two minor respects such as quux in the example and they do not have primary keys requiring more rows to be used than is necessary which doesn't help the indexes and seems redundant.
Are there ways to improve the performance further than simple indexing? I am willing to alter the underlying tables and create orphan tables.
Upvotes: 2
Views: 81
Reputation: 25132
One performance issue you could improve is joining across servers. This happens with OPENROWSET, linked servers, etc.
The easiest way to mitigate it is to pull your data into temp tables or staging tables from your remote data sources. This will store the data locally in tempdb. Then, join to the temp tables. SQL Server will automatically create statistics for these tables which will greatly improve your performance, plus the data is local, but you can also add indexes to speed up your joins if you’d like.
Upvotes: 1