njb
njb

Reputation: 105

Cannot create CLUSTERED INDEX on a View due to LEFT JOIN or subquery

I have created two options for a query used in a view which return the results I require. I need to rewrite either option so that it can be used in an Indexed View. Both fail when creating a unique clustered index on the view. The first one fails due to the LEFT OUTER JOIN, and the second fails due to a subquery. I believe both will also fail due to a self join.

After finding Creating Indexed Views, there is a large list of TSQL syntax elements that cannot be used. Among them: a derived table, UNION, EXCEPT, INTERSECT, subqueries, Outer or self joins, TOP, ORDER BY, DISTINCT, MAX...

The query should get the largest CompanyID for each unique Company. The StatusName in the Statuses table also needs to be displayed, and I'm only adding that in case it affects the solution. It's currently an INNER JOIN, so it's not causing a problem with creating the index.

Example for the Companies table, with all 3 columns being INT:

CompanyID Company Revision
1         1       1
2         1       2
3         2       1
4         2       2

The query should return:

CompanyID Company Revision
2         1       2
4         2       2

Here are the two options I've created:

SELECT t1.CompanyID, t1.Company, t1.Revision, Statuses.StatusName
FROM dbo.Companies AS t1

LEFT OUTER JOIN dbo.Companies AS t2
ON t1.Company = t2.Company AND t1.CompanyID < t2.CompanyID

INNER JOIN dbo.Statuses
ON dbo.Statuses.StatusID = t1.StatusID

WHERE t2.Company IS NULL

And the other:

SELECT t1.CompanyID, t1.Company, t1.Revision, Statuses.StatusName
FROM dbo.Companies AS t1

INNER JOIN dbo.Statuses
ON dbo.Statuses.StatusID = t1.StatusID

WHERE t1.Company NOT IN (SELECT t2.Company from dbo.Companies AS t2 WHERE t1.CompanyID < t2.CompanyID)

So, my question is, can either query be rewritten to be used in an Indexed View?

I'm using MS SQL Server 2008 R2, and 2005.

Upvotes: 3

Views: 11131

Answers (2)

MatBailie
MatBailie

Reputation: 86706

Unfortunately, you can't.

Not only does your query necessitate a LEFT JOIN, you're LEFT JOINing the same table on to itself. And to quote BooksOnline and your question...

The SELECT statement in the view cannot contain the following Transact-SQL syntax elements: 
 - Outer or self joins.

An alternative option may be to create a real mapping table that you keep updated via a trigger. Records being created/deleted as Companies changes, and records being updated as Statuses changes.


Equally, views are expanded in-line into the queries in which they are used (unless you specifically say otherwise with the NOEXPAND hint). Have you checked the execution plans of your queries to see if you could create more appropriate indexes on the base tables?

EDIT

An alternative query layout, just as an option...

;WITH
  sequenced_companies
AS
(
  SELECT
    ROW_NUMBER() OVER (PARTITION BY Company ORDER BY CompanyID DESC) AS sequence_id,
    *
  FROM
    dbo.companies
)
SELECT
  *
FROM
  sequenced_companies
INNER JOIN
  dbo.statuses
    ON statuses.StatusID = sequenced_companies.StatusID
WHERE
  sequenced_companies.sequence_id = 1

With an index on (Company, CompanyID DESC), this should be quite quick. (Although still not suitable for an indexable view.)

Upvotes: 1

anon
anon

Reputation:

Instead of creating an exclusionary view, why not try the other way:

CREATE VIEW dbo.HighestCompany
AS
  SELECT t1.CompanyID, t1.Company, t1.Revision, s.StatusName
    FROM dbo.Companies AS t1
    INNER JOIN (
      SELECT Company, HighestCompany = MAX(CompanyID) 
      FROM dbo.Companies GROUP BY Company
    ) AS t2
    ON t1.Company = t2.Company
    AND t1.CompanyID = t2.HighestCompany -- not sure if CompanyID is unique
    INNER JOIN dbo.Statuses AS s
    ON s.StatusID = t1.StatusID;

You still won't be able to create an indexed view on this, but it may work a little better than the versions you currently have (depending on several factors, of course, including index on Company and selectivity).

Aside from that, I think to improve performance you'll need to take a look at the indexing strategy on the base tables. Why does your Companies table allow multiple companies with the same name and a different ID? Maybe that is part of the problem, and you should just store the currently relevant company in a separate table.

You could do this as follows (keep in mind I am guessing at data types and optimal indexes here):

CREATE SCHEMA hold AUTHORIZATION dbo;
GO
CREATE SCHEMA cache AUTHORIZATION dbo;
GO
CREATE TABLE dbo.HighestCompany
(
  CompanyID INT, 
  Company NVARCHAR(255) PRIMARY KEY,
  Revision INT,
  StatusName NVARCHAR(64)
);
GO
CREATE TABLE cache.HighestCompany
(
  CompanyID INT, 
  Company NVARCHAR(255) PRIMARY KEY,
  Revision INT,
  StatusName NVARCHAR(64)
);
GO

Now however often you think this data needs to be refreshed, you can run a job that does the following:

TRUNCATE TABLE cache.HighestCompany;

INSERT cache.HighestCompany(CompanyID, Company, Revision, StatusName)
SELECT t1.CompanyID, t1.Company, t1.Revision, s.StatusName
        FROM dbo.Companies AS t1
        INNER JOIN (
          SELECT Company, HighestCompany = MAX(CompanyID) 
          FROM dbo.Companies GROUP BY Company
        ) AS t2
        ON t1.Company = t2.Company
        AND t1.CompanyID = t2.HighestCompany
        INNER JOIN dbo.Statuses AS s
        ON s.StatusID = t1.StatusID;

-- this is a fast, metadata operation that should result
-- in minimal blocking and disruption to end users:
BEGIN TRANSACTION;
  ALTER SCHEMA hold TRANSFER dbo.HighestCompany;
  ALTER SCHEMA dbo TRANSFER cache.HighestCompany;
  ALTER SCHEME cache TRANSFER hold.HighestCompany;
COMMIT TRANSACTION;

If you find the companies change so often, or the data really needs to be up-to-the-second, that this isn't practical, you could do something similar with a trigger as @Dems suggested.

Upvotes: 5

Related Questions