Reputation:
I've got some data I'd like to pull off our SQL server.
This old database does not have any primary keys associated with it, so pulling data is like querying an Excel spreadsheet (what it actually originated as years ago).
I need to run reports on this data, though.
Currently, I get a list of distinct serial numbers for a given time period, then pull all of the records for a given serial number. For a 1-month time frame, this can be 1500 to 3000 serial numbers. The serial number field is formatted as char(20)
, even though the serial numbers are only 15 characters long.
BEGIN UPDATE
Serial_Number
.Date_Time
values are possibleEND UPDATE
This process takes a while, but between different serial numbers in the list, I am able to update the Windows Form with a Progress Bar so management knows something is happening and about how much longer to expect.
I am always trying to make this query run faster.
Now, I am thinking about pulling the data I need using a WHERE
clause such as:
SELECT Col1, Col2, Col3
FROM Table1
WHERE Serial_Number IN (
SELECT DISTINCT Serial_Number
FROM Table1
WHERE Date_Time Between @startDate AND @endDate
)
My question is: Are there any issues I could run into with this, particularly because we have so many distinct serial numbers during a given time frame.
And, of course, you know someone in Management is going to try running a year's worth of data when they are bored! Then, they are going to try running data since Jesus was born, just because they've got nothing better to do.
Restate Question: Is there a limit to the WHERE
clause's IN
method that limits the number of items I can pass in?
Upvotes: 2
Views: 225
Reputation:
Apparently, there is no way to tell what the maximum length of the WHERE X IN (...)
can be.
For now, this is the answer.
If, at some later point in time, someone comes along and finds something to the contrary, please post that answer and I will mark it as such.
Thanks, Joe
Upvotes: 0
Reputation: 2597
Honestly, I see no benefit to the WHERE
clause as it is written.
You use an expensive inner query, but don't do anything meaningful with the results. I don't even see you getting the Serial_Number
in the results anywhere. However, based on your question, it does sound like you need it.
I don't see the need for the DISTINCT
keyword for Serial_Number
, since the duplicates would not be eliminated in the results in the outer query.
What is wrong with doing this?
SELECT Serial_Number, Col1, Col2, Col3
FROM Table1
WHERE Date_Time Between @startDate AND @endDate
This should do the same thing as your original query. But it would eliminate the expensive nested query.
Just put an index on Date_Time
and it should work. This would also eliminate the need for the index on Serial_Number
.
Upvotes: 1
Reputation: 16578
Well, in the naive case where there are no indexes (which it sounds like is your case) you're going to have to scan over all the rows in Table1
to perform the DISTINCT
on Serial_Number
anyway. So I'm not sure it's going to help you much.
I would highly recommend the following:
Just from what we see here, it sounds like Date_Time
would be a good candidate for a clustered index in Table1
.
Edit:
To make a nonunique clustered index as I describe above, you can use the following:
CREATE CLUSTERED INDEX IX_Table1_Date_Time
ON Table1 (Date_Time)
(from http://msdn.microsoft.com/en-us/library/aa258260(v=sql.80).aspx)
This will reorder your table such that all rows are sorted in Date_Time order. Further work with the execution plan will help identify other indexes that may greatly help your performance, depending on the exact types of queries you run.
Upvotes: 1
Reputation: 50970
Index Serial_Number and Date_Time in Table1 (with separate indexes, not a single compound index) and this should perform fairly well for you unless the table is really truly ginormous.
You might get a little more speed with one index on Serial_Number and the second on (Date_Time, Serial_Number). That second index covers the sub query, allowing it to be answered from the index alone.
Note: I'm suggesting indexes, not primary keys, which don't require uniqueness.
Upvotes: 2