Mike Perrenoud
Mike Perrenoud

Reputation: 67918

Is WITH the replacement for a #TEMP table?

So based off the way I've seen WITH used, and the documentation at MSDN:

Specifies a temporary named result set, known as a common table expression (CTE).

it appears that the WITH is the replacement for #TEMP tables. Is that correct?

Upvotes: 5

Views: 14734

Answers (3)

ErikE
ErikE

Reputation: 50271

No. CTEs—introduced by WITH—don't replace temp tables, although in some cases they can be used where one might have used a temp table in the past.

WITH is really nothing more than a derived table, with the difference that it is introduced before the query instead of inline, and is given an alias which can then be used as a table throughout the query multiple times.

A derived table is a complete query, inside of parentheses, that is used as if it were a real table. Views and table-valued functions are also considered derived tables, but we're focusing on the kind that is defined inline. Here is an example:

SELECT
   C.Name,
   S.SalesTotal
FROM
   dbo.Customer C
   INNER JOIN (
      SELECT
         O.CustomerID,
         SalesTotal = Sum(OrderTotal)
      FROM
         dbo.CustomerOrder O
      GROUP BY
         O.CustomerID
   ) S
      ON C.CustomerID = S.CustomerID;

We have a completely intact query that returns its own rowset (the GROUP BY query). By placing this inside of parentheses and assigning it an alias S, we can now use it like a table. We could join more tables to this one. But, we have only joined to this table once.

To convert this to a CTE, we make a very simple change:

WITH SalesTotals AS (
   SELECT
      O.CustomerID,
      SalesTotal = Sum(OrderTotal)
   FROM
      dbo.CustomerOrder O
   GROUP BY
      O.CustomerID
)
SELECT
   C.Name,
   S.SalesTotal
FROM
   dbo.Customer C
   INNER JOIN SalesTotals S
      ON C.CustomerID = S.CustomerID
   -- and for an example of using the CTE twice:
   INNER JOIN (
      SELECT Avg(SalesTotal)
      FROM SalesTotals
   ) A (AverageSalesTotal)
      ON S.SalesTotal >= A.AverageSalesTotal;

Now, a temp table is a completely different animal. It has very important differences from a CTE or derived table:

  • A temp table persists over many queries (for the lifetime of the client connection, or until explicitly dropped) but a CTE only "exists" for one query.
  • A CTE, while logically a "single" table, is likely to have its data generated multiple times if used multiple times in a query. A temp table's data would simply be read as any other "real" table. In the above example, the Avg(SalesTotal) calculation, in versions of SQL Server through at least 2012, will involve a completely separate operation of performing the SalesTotals aggregate a second time. While it is possible for the engine to materialize the results of the CTE, so far SQL Server has not done this. It is notable that other DBMSes such as Oracle may materialize the results of a CTE. In any case, you should be aware that this double-querying can have (of course!) serious performance implications.
  • A temp table has column statistics automatically generated for it and this can aid the query optimizer in choosing better execution plans. A CTE's "final" rowset has no statistics—the statistics of the underlying tables are used.
  • A temp table can be added to incrementally or have rows deleted from it by multiple or repeated statements. It can be updated.
  • A temp table can be modified to add or remove columns or change data types.
  • A temp table can have clustered and non-clustered indexes and constraints.
  • You cannot use a temp table in any way inside a user-defined function.
  • A CTE, while appearing to logically segregate parts of a query, does no such thing. CTEs are perfect candidates for predicate push-down, elimination if it is determined they do not affect the final rowset (or some of their tables or joins eliminated), or they may be subject to unexpected expression evaluation order. For example, in a CTE you might return only the numeric strings from a text column, and in the outer query try to convert these strings to a numeric data type, but to your surprise you get an error about attempting to convert non-number strings to a numeric data type. That is because the optimizer is free to reorganize your query in any way it pleases, and may do the conversion to numeric before the filter for number-containing strings. A temp table, while requiring two statements (one to insert the data, and a second to join to that data) would not have this problem as the queries are distinct and the data truly "materialized" as expected before using it.

Last, a CTE can do something a temp table cannot: it can be recursive. In Oracle this is expressed through CONNECT BY, and in SQL Server it is done with a UNION ALL SELECT inside the CTE that is allowed to refer to the CTE's own alias.

Be careful with CTEs. They are a great abstraction, but are nothing more than that, and you can run into serious trouble with them. Generating a million rows can be done with a recursive CTE one row at a time, but it's the WORST possible way by like a hundred times over or more.

There is another special kind of temp table in SQL Server 2005 and up called a "table variable" that is very much like a temp table (and kept in tempdb exactly the same), with a few notable exceptions:

  • It only lasts the duration of the batch, not the connection.
  • You can use a table variable inside a user-defined function. Some types of UDFs require one.
  • It can only have inline constraints declared (such as primary keys or uniqueness), and while it can have rows updated/inserted/deleted, its schema cannot be modified after declaration in any way, so no adding/removing columns, changing data types, or adding indexes.
  • It does not collect statistics.
  • It can be passed as a parameter (table-valued parameter) in SQL Server 2008 and up.

Upvotes: 19

Aaron Carta
Aaron Carta

Reputation: 58

The SQL optimizer does a much better job today at choosing great execution plans, but when joining more than 10 tables, especially with some large tables and views and needing to use multiple filters, it often does not perform optimally. I still find there is nothing as fast as using #TEMP tables and breaking the queries down into much smaller subsets before joining them together. NOTE: it is rare that I find that adding indexes to the #TEMP tables improves performance.

Upvotes: 1

JordanMazurke
JordanMazurke

Reputation: 1123

No, that is not correct. They are both separate pieces of functionality and each have their individual uses.

For instance, CTEs are good for small bits of data while temp tables are usually better suited to larger datasets. Temp tables can be indexed and their performance improved whereas CTEs can not.

I would spend a little time reading through the MSDN documentation and looking at particular instances where you would use one or the other

Upvotes: 0

Related Questions