Reputation: 1040
I'm writing complex ad-hoc queries to extract data from a client's SQL Server. Some of the queries take a half-hour or more to run and I need them to execute in 5 minutes or less. Unfortunately, I don't have permissions to view a showplan or on any of the sys.dm_
views to help me optimize them.
I do have access to all of the INFORMATION_SCHEMA
and sys
catalog views, so I know what indexes exist, and I can also use STATISTICS TIME
and IO
to help me measure the effectiveness of the changes.
Since I don't have the ability to compare showplans, how can I best use these tools to guide my intuitions to minimize the time-consuming trial and error? Or does anyone have any innovative solutions that have worked for them in the past.
EDIT: I only have db_DataReader permissions--I can't create new objects or indexes, but I can create temp tables and index them.
Upvotes: 4
Views: 3305
Reputation: 322
Study the internals of SQL Server and practice the art of query plan prediction.
Practice by writing queries against data you know cold (cardinality, density, distribution, etc). Make sure the queries are complex enough that it's a challenge. Then see if you can picture the plan in your head. Does that compare to the actual. When you start to figure out when SQL will pick LOOP v HASH v MERGE join types and other such operations like sorts, tops, etc ... Then you are starting to get a feel for it.
Regarding Internals documentation, there's never been a book to top this one: https://www.amazon.com/Gurus-Guide-SQL-Server-Boxed/dp/0321287509/. Since Ken is no longer with us :(, I'm not sure what the best current book is. The way I learned how to do this by feel is studying internals.
You can download a free version of SQL Server (many versions) so that you can look at query plans: https://www.microsoft.com/en-us/download/details.aspx?id=42299. Use that for practice.
There are many well documented sample databases for practice: https://www.codeproject.com/Articles/20987/HowTo-Install-the-Northwind-and-Pubs-Sample-Databa https://blogs.msdn.microsoft.com/samlester/2012/08/23/finding-the-correct-version-of-the-adventureworks-sql-server-sample-database/
Before the days of graphical query plans, we used time and io stats and showplan.
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-statistics-io-transact-sql https://learn.microsoft.com/en-us/sql/t-sql/statements/set-statistics-time-transact-sql https://learn.microsoft.com/en-us/sql/t-sql/statements/set-showplan-all-transact-sql
SET STATISTICS IO ON
SET STATISTICS TIME ON
go
SET SHOWPLAN_ALL ON
go
SELECT o.*, od.*
FROM [dbo].[Orders] o
INNER JOIN [dbo].[Order Details] od
ON o.OrderID = od.OrderID
WHERE od.OrderID = 10248
go
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
StmtText StmtId NodeId Parent PhysicalOp LogicalOp Argument DefinedValues EstimateRows EstimateIO EstimateCPU AvgRowSize TotalSubtreeCost OutputList Warnings Type Parallel EstimateExecutions
------------------------- ----------- ----------- ----------- ------------------------------ ------------------------------ -------- ------------- ------------- ------------- ------------- ----------- ---------------- ---------- -------- ---------------------------------------------------------------- -------- ------------------
SET STATISTICS IO ON 1 1 0 NULL NULL 1 NULL NULL NULL NULL NULL NULL NULL NULL SET STATS 0 NULL
SET STATISTICS TIME ON 2 2 0 NULL NULL 2 NULL NULL NULL NULL NULL NULL NULL NULL SET STATS 0 NULL
(2 row(s) affected)
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
StmtText StmtId NodeId Parent PhysicalOp LogicalOp Argument DefinedValues EstimateRows EstimateIO EstimateCPU AvgRowSize TotalSubtreeCost OutputList Warnings Type Parallel EstimateExecutions
---------------------- ----------- ----------- ----------- ------------------------------ ------------------------------ -------- ------------- ------------- ------------- ------------- ----------- ---------------- ---------- -------- ---------------------------------------------------------------- -------- ------------------
SET SHOWPLAN_ALL ON 1 1 0 NULL NULL 1 NULL NULL NULL NULL NULL NULL NULL NULL SET ON/OFF 0 NULL
(1 row(s) affected)
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
StmtText StmtId NodeId Parent PhysicalOp LogicalOp Argument DefinedValues EstimateRows EstimateIO EstimateCPU AvgRowSize TotalSubtreeCost OutputList Warnings Type Parallel EstimateExecutions
----------------------------------------------------------------------------------------------------------------------------------------------------- ----------- ----------- ----------- ------------------------------ ------------------------------ --------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ------------- ------------- ------------- ----------- ---------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------- ---------------------------------------------------------------- -------- ------------------
SELECT o.*, od.*
FROM [dbo].[Orders] o
INNER JOIN [dbo].[Order Details] od
ON o.OrderID = od.OrderID
WHERE od.OrderID = 10248 1 1 0 NULL NULL 1 NULL 3 NULL NULL NULL 0.00658094 NULL NULL SELECT 0 NULL
|--Nested Loops(Inner Join) 1 2 1 Nested Loops Inner Join NULL NULL 3 0 1.254E-05 254 0.00658094 [o].[OrderID], [o].[CustomerID], [o].[EmployeeID], [o].[OrderDate], [o].[RequiredDate], [o].[ShippedDate], [o].[ShipVia], [o].[Freight], [o].[ShipName], [o].[ShipAddress], [o].[ShipCity], [o].[ShipRegion], [o].[ShipPostalCode], [o].[ShipCountry], [od].[Ord NULL PLAN_ROW 0 1
|--Clustered Index Seek(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders] AS [o]), SEEK:([o].[OrderID]=(10248)) ORDERED FORWARD) 1 3 2 Clustered Index Seek Clustered Index Seek OBJECT:([Northwind].[dbo].[Orders].[PK_Orders] AS [o]), SEEK:([o].[OrderID]=(10248)) ORDERED FORWARD [o].[OrderID], [o].[CustomerID], [o].[EmployeeID], [o].[OrderDate], [o].[RequiredDate], [o].[ShippedDate], [o].[ShipVia], [o].[Freight], [o].[ShipName], [o].[ShipAddress], [o].[ShipCity], [o].[ShipRegion], [o].[ShipPostalCode], [o].[ShipCountry] 1 0.003125 0.0001581 231 0.0032831 [o].[OrderID], [o].[CustomerID], [o].[EmployeeID], [o].[OrderDate], [o].[RequiredDate], [o].[ShippedDate], [o].[ShipVia], [o].[Freight], [o].[ShipName], [o].[ShipAddress], [o].[ShipCity], [o].[ShipRegion], [o].[ShipPostalCode], [o].[ShipCountry] NULL PLAN_ROW 0 1
|--Clustered Index Seek(OBJECT:([Northwind].[dbo].[Order Details].[PK_Order_Details] AS [od]), SEEK:([od].[OrderID]=(10248)) ORDERED FORWARD) 1 4 2 Clustered Index Seek Clustered Index Seek OBJECT:([Northwind].[dbo].[Order Details].[PK_Order_Details] AS [od]), SEEK:([od].[OrderID]=(10248)) ORDERED FORWARD [od].[OrderID], [od].[ProductID], [od].[UnitPrice], [od].[Quantity], [od].[Discount] 3 0.003125 0.0001603 29 0.0032853 [od].[OrderID], [od].[ProductID], [od].[UnitPrice], [od].[Quantity], [od].[Discount] NULL PLAN_ROW 0 1
(4 row(s) affected)
Upvotes: 0
Reputation: 391
The practical way is: instead of running complex queries in client production databases, extract the necessary data from them. And run the rest of your query to your own instance or testing environment.
Upvotes: 0
Reputation: 31785
Here's the strategy I would explore. It's based on the rule of thumb that the overwhelming majority of query optimization is achieved by using covering indexes to avoid Table Scans and Hash Match joins.
Look at each of the tables you need to include in your query, and the columns that you need to use in either the JOIN ON
clause, or the WHERE
clause. If the table has an index that includes all those columns, then you can just go ahead and use that table in your query. You can even use a query hint to force the query to use that covering index, but it shouldn't be needed.
If table doesn't have such an index, then you need to populate a temp table with the minimum amount of data you can get using an existing index.
For a super-simple example, say there's a table with ten columns and a million rows that only has an Index on Column1
. In your query, you need to JOIN
to that table on Column2
and include Column3
(only) in your resultset.
But on top of that, maybe for your final resultset, you're only interested in data that has Column1
values between 1-100.
I would create a temp table that only has Column2
& Column3
, with a Clustered Index on Column2
, and populate it with an INSERT..SELECT
that gets data from the original table using a WHERE
clause filter on Column1
to get only the rows I need.
This way you are building indexes on the smallest possible table which hopefully will get you a noticeable performance gain over using queries that scan the original tables.
Upvotes: 1