user153923
user153923

Reputation:

SQL Limit on "WHERE X IN (...)"

I've got some data I'd like to pull off our SQL server.

This old database does not have any primary keys associated with it, so pulling data is like querying an Excel spreadsheet (what it actually originated as years ago).

I need to run reports on this data, though.

Currently, I get a list of distinct serial numbers for a given time period, then pull all of the records for a given serial number. For a 1-month time frame, this can be 1500 to 3000 serial numbers. The serial number field is formatted as char(20), even though the serial numbers are only 15 characters long.

BEGIN UPDATE

END UPDATE

This process takes a while, but between different serial numbers in the list, I am able to update the Windows Form with a Progress Bar so management knows something is happening and about how much longer to expect.

I am always trying to make this query run faster.

Now, I am thinking about pulling the data I need using a WHERE clause such as:

SELECT Col1, Col2, Col3
FROM Table1
WHERE Serial_Number IN (
  SELECT DISTINCT Serial_Number
  FROM Table1
  WHERE Date_Time Between @startDate AND @endDate
)

My question is: Are there any issues I could run into with this, particularly because we have so many distinct serial numbers during a given time frame.

And, of course, you know someone in Management is going to try running a year's worth of data when they are bored! Then, they are going to try running data since Jesus was born, just because they've got nothing better to do.

Restate Question: Is there a limit to the WHERE clause's IN method that limits the number of items I can pass in?

Upvotes: 2

Views: 225

Answers (4)

user153923
user153923

Reputation:

Apparently, there is no way to tell what the maximum length of the WHERE X IN (...) can be.

For now, this is the answer.

If, at some later point in time, someone comes along and finds something to the contrary, please post that answer and I will mark it as such.

Thanks, Joe

Upvotes: 0

Geoff Montee
Geoff Montee

Reputation: 2597

Honestly, I see no benefit to the WHERE clause as it is written.

You use an expensive inner query, but don't do anything meaningful with the results. I don't even see you getting the Serial_Number in the results anywhere. However, based on your question, it does sound like you need it.

I don't see the need for the DISTINCT keyword for Serial_Number, since the duplicates would not be eliminated in the results in the outer query.

What is wrong with doing this?

SELECT Serial_Number, Col1, Col2, Col3
FROM Table1
WHERE Date_Time Between @startDate AND @endDate

This should do the same thing as your original query. But it would eliminate the expensive nested query.

Just put an index on Date_Time and it should work. This would also eliminate the need for the index on Serial_Number.

Upvotes: 1

mwigdahl
mwigdahl

Reputation: 16578

Well, in the naive case where there are no indexes (which it sounds like is your case) you're going to have to scan over all the rows in Table1 to perform the DISTINCT on Serial_Number anyway. So I'm not sure it's going to help you much.

I would highly recommend the following:

  • Use the execution plan to determine what's going on in your query, and
  • Use that information to add some relevant indexes to speed your operations.

Just from what we see here, it sounds like Date_Time would be a good candidate for a clustered index in Table1.

Edit:

To make a nonunique clustered index as I describe above, you can use the following:

CREATE CLUSTERED INDEX IX_Table1_Date_Time
ON Table1 (Date_Time)

(from http://msdn.microsoft.com/en-us/library/aa258260(v=sql.80).aspx)

This will reorder your table such that all rows are sorted in Date_Time order. Further work with the execution plan will help identify other indexes that may greatly help your performance, depending on the exact types of queries you run.

Upvotes: 1

Larry Lustig
Larry Lustig

Reputation: 50970

Index Serial_Number and Date_Time in Table1 (with separate indexes, not a single compound index) and this should perform fairly well for you unless the table is really truly ginormous.

You might get a little more speed with one index on Serial_Number and the second on (Date_Time, Serial_Number). That second index covers the sub query, allowing it to be answered from the index alone.

Note: I'm suggesting indexes, not primary keys, which don't require uniqueness.

Upvotes: 2

Related Questions