Reputation: 9

Is there any reason why joining two views over quadruples their combined run times

I have 2 views that are selecting large datasets from an external source. They aren't doing any calculations or aggregations, just a long select statement.

I am using an

INNER JOIN

to link the two views based on a GUID.

The individual selections from each view are as follows.

view1, 3:08 Run time, 174,842 Records retrieved

view2, 0:02 Run Time, 93,493 Records retrieved

When I Join them, I get the following

Join, 14:32 Run Time, 177,753 records retrieved

So far, I've tried

LEFT JOIN 
RIGHT JOIN 
INNER JOIN
JOIN

I've tried joining view1 to view2 vs joining view2 to view1. I've tried calling one view then selecting from that while joining to the other view.

Nothing seems to impact it.

SQL below for reference

SELECT
    v1.guid, 
    CONVERT(DATE, v1.CreatedOn) AS CreatedOn,
    field1, 
    field2, 
    field3,
    field4,
    field5
FROM
    View1 v1
    INNER JOIN View2 V2 ON v1.guid = v2.guid
WHERE
    field6 = 'value'

(obligatory those aren't the actual field names)

I'm getting the expected result, its just taking way too long for its purpose.

Any help optimising would be appreciated

Upvotes: 0

Answers (2)

Rob_O

Reputation: 9

It turns out the answer was don't select 100+ columns into a view then select 1 column out of it! who would've thought it

Upvotes: 0

gotqn

Reputation: 43666

Try the following:

SELECT *
INTO #view1
FROM view1

SELECT *
INTO #view2
FROM view2

SELECT
    v1.guid, 
    CONVERT(DATE, v1.CreatedOn) AS CreatedOn,
    field1, 
    field2, 
    field3,
    field4,
    field5
FROM #View1 v1
INNER JOIN #View2 V2 
    ON v1.guid = v2.guid
WHERE
    field6 = 'value'

The first two statements materialized the view data in temporary tables. If the engine is not able to build good execution plan in your original query, the above should help.

If the above is not helping, try to defined the temporary tables the better way defining primary keys. Something like this:

CREATE TABLE #view1
(
    guid UNIQUEIDENTIFIER PRIMARY KEY
    ....
)

INSERT INTO #view1
SELECT *
FROM view1

So, in this way the data should be ordered by GUID and in theory we should get faster join.

The above can lead to better performance but we have a bigger issue here - you are joining by UNIQUEIDENTIFIER - I know you may see people using this as primary key, but you will finding joining by int or bigint faster. If you need such guid column in order not to expose internal IDs in your application or something else, this does not not mean you can't have integer column to perform the joins in the SQL.

Also, if you are not able to store the data in the view in temporary tables, you can check how indexed views are created and if you can - store the data that is needed only (apply filtering criteria in advance) - for example:

INSERT INTO #view1
SELECT *
FROM view1
WHERE field6 = 'value'

So, now the table has fewer rows, right?

Upvotes: 1

Is there any reason why joining two views over quadruples their combined run times

Answers (2)

Related Questions