ba0708
ba0708

Reputation: 10599

Dynamic query optimization

I have an assignment for a business which is basically just about extracting data from a database (Microsoft SQL Server 2008). In the process, the users will be able to choose which columns to select, choose which view to select from, and build the WHERE clause. Based on what the user chooses, the SQL query is constructed accordingly. A requirement is that the user may select ANY column from ANY view and filter by ANY column in the WHERE clause. The company does not want the solution to use data warehouse/OLAP, and wants to limit any third party software. So basically they just want a .NET Windows Forms application which dynamically constructs SQL queries based on a GUI and connects to a database.

My concern here is how to optimize the queries. I am by no means good at optimizing SQL queries yet, but my first thought was: what if the user chooses to filter by a column which has no index (in the WHERE clause)? By giving the user so much flexibility, they can potentially construct queries that are so ineffective that they will take a long time to execute.

I realize that the performance can never be good with a lot of data if they filter on columns that have no indices, but is there anything I can do to improve it? For sure I cannot just add indices to all columns.

I am not necessarily just looking for query optimization, but I am also thinking if there are any server tweaks that I can do, such as caching? Basically I am all ears and looking for any advice that can help me improve the performance.

Any suggestions?

Thank you in advance!

Upvotes: 7

Views: 4997

Answers (2)

usr
usr

Reputation: 171246

You really cannot do much except forseeing what users are likely going to do. You are in the good position to have the SQL Server optimizer do the hard work for you (imagine building this on a key-value store!).

I would create indexes on the most likely columns that will be filtered or sorted on. You should try filtering those indexes to non-null values which will reduce storage cost (assuming users will not filter for null values).

You can also try to precompute common joins and aggregations using indexed views. If you are willing to throw insane amounts of RAM at this problem and are willing to have slow writes you can index and materialize the hell out of this database.

Finally, you can offload users queries on a read-only log-shipping target or the like. This will sandbox their horrible queries.

For your queries, you need to dparameterize them, but you do not need to cache them in all cases. If your queries tend to have a big cost (so compilation times are inconsequential) you will want to run them WITH OPTION RECOMPILE so SQL Server can adapt to the exact runtime values of all parameters.

You should also monitor all queries and review them to look for patterns. Your users are likely to run very similar queries all the time. Index for them.

Run sp_updatestats regularly.

Finally, I want to say that there is no very effective solution to this because if there were SQL Server would implement them itself so everyone could benefit.

Upvotes: 4

Louis Ricci
Louis Ricci

Reputation: 21106

First, to improve SQL Server's ability to optimize, cache and compile queries/statements

  • Make sure the UI supports IN and BETWEEN while letting users build their own WHERE clause.
  • Sort your AND or OR conditions so that indexed columns are first, followed by alphabetical order of the other columns.
    • If you're allowing nested ANDs and ORs in your WHERE clause this may be more difficult
  • Use *parameterized queries"
WHERE C1 = 'foo' AND C3 = 'bar' AND C2 = 42
-- if C3 is an indexed column then 
WHERE C3 = @parm1 AND C1 = @parm2 AND C2 = @parm3

Second, to empower users

  • When listing the columns the user can choose from, list the indexed columns first or make them recommended columns to use.
  • Build in some recording of the columns that users choose and the time their query takes to complete. Having this information can help you tune the database in the future and improve the user experience.

EDIT OR -> AND or OR with regards to Martin Smith's comment, this is called Short Circuiting.

Consider the logic

A = True OR B = True OR C = True

If A is indeed True there's no need to evaluate B or C for the condition to be true

A = True AND B = True AND C = True

In this case if A is False there is no need to evaulate B or C for the condition to be false.

Upvotes: 1

Related Questions