Reputation: 63949
How do I parameterize a query containing an IN
clause with a variable number of arguments, like this one?
SELECT * FROM Tags
WHERE Name IN ('ruby','rails','scruffy','rubyonrails')
ORDER BY Count DESC
In this query, the number of arguments could be anywhere from 1 to 5.
I would prefer not to use a dedicated stored procedure for this (or XML), but if there is some elegant way specific to SQL Server 2008, I am open to that.
Upvotes: 1143
Views: 453701
Reputation: 85625
You can parameterize each value, so something like:
string[] tags = new string[] { "ruby", "rails", "scruffy", "rubyonrails" };
string cmdText = "SELECT * FROM Tags WHERE Name IN ({0})";
string[] paramNames = tags.Select(
(s, i) => "@tag" + i.ToString()
).ToArray();
string inClause = string.Join(", ", paramNames);
using (SqlCommand cmd = new SqlCommand(string.Format(cmdText, inClause)))
{
for (int i = 0; i < paramNames.Length; i++)
{
cmd.Parameters.AddWithValue(paramNames[i], tags[i]);
}
}
Which will give you:
cmd.CommandText = "SELECT * FROM Tags WHERE Name IN (@tag0, @tag1, @tag2, @tag3)"
cmd.Parameters["@tag0"] = "ruby"
cmd.Parameters["@tag1"] = "rails"
cmd.Parameters["@tag2"] = "scruffy"
cmd.Parameters["@tag3"] = "rubyonrails"
No, this is not open to SQL injection. The only injected text into CommandText is not based on user input. It's solely based on the hardcoded "@tag" prefix, and the index of an array. The index will always be an integer, is not user generated, and is safe.
The user inputted values are still stuffed into parameters, so there is no vulnerability there.
Edit:
Injection concerns aside, take care to note that constructing the command text to accomodate a variable number of parameters (as above) impede's SQL server's ability to take advantage of cached queries. The net result is that you almost certainly lose the value of using parameters in the first place (as opposed to merely inserting the predicate strings into the SQL itself).
Not that cached query plans aren't valuable, but IMO this query isn't nearly complicated enough to see much benefit from it. While the compilation costs may approach (or even exceed) the execution costs, you're still talking milliseconds.
If you have enough RAM, I'd expect SQL Server would probably cache a plan for the common counts of parameters as well. I suppose you could always add five parameters, and let the unspecified tags be NULL - the query plan should be the same, but it seems pretty ugly to me and I'm not sure that it'd worth the micro-optimization (although, on Stack Overflow - it may very well be worth it).
Also, SQL Server 7 and later will auto-parameterize queries, so using parameters isn't really necessary from a performance standpoint - it is, however, critical from a security standpoint - especially with user inputted data like this.
Upvotes: 774
Reputation: 137
If you are interested in doing everything in SQL procedures and functions (as the initial question was formulated), and not concatenating strings in C#, you can use a fairly simple method:
DECLARE @testFilter VARCHAR(50) = 'test1,test2,test3'
SELECT * FROM testTable WHERE testField IN (SELECT value FROM STRING_SPLIT(@testFilter, ','))
Upvotes: 0
Reputation: 108370
The original question was "How do I parameterize a query ..."
This is not an answer to that original question. There are some very good demonstrations of how to do that, in other answers.
See the first answer from Mark Brackett (the first answer starting "You can parameterize each value") and Mark Brackett's second answer for the preferred answer that I (and 231 others) upvoted. The approach given in his answer allows 1) for effective use of bind variables, and 2) for predicates that are sargable.
Selected answer
I am addressing here the approach given in Joel Spolsky's answer, the answer "selected" as the right answer.
Joel Spolsky's approach is clever. And it works reasonably, it's going to exhibit predictable behavior and predictable performance, given "normal" values, and with the normative edge cases, such as NULL and the empty string. And it may be sufficient for a particular application.
But in terms generalizing this approach, let's also consider the more obscure corner cases, like when the Name
column contains a wildcard character (as recognized by the LIKE predicate.) The wildcard character I see most commonly used is %
(a percent sign.). So let's deal with that here now, and later go on to other cases.
Some problems with % character
Consider a Name value of 'pe%ter'
. (For the examples here, I use a literal string value in place of the column name.) A row with a Name value of `'pe%ter' would be returned by a query of the form:
select ...
where '|peanut|butter|' like '%|' + 'pe%ter' + '|%'
But that same row will not be returned if the order of the search terms is reversed:
select ...
where '|butter|peanut|' like '%|' + 'pe%ter' + '|%'
The behavior we observe is kind of odd. Changing the order of the search terms in the list changes the result set.
It almost goes without saying that we might not want pe%ter
to match peanut butter, no matter how much he likes it.
Obscure corner case
(Yes, I will agree that this is an obscure case. Probably one that is not likely to be tested. We wouldn't expect a wildcard in a column value. We may assume that the application prevents such a value from being stored. But in my experience, I've rarely seen a database constraint that specifically disallowed characters or patterns that would be considered wildcards on the right side of a LIKE
comparison operator.
Patching a hole
One approach to patching this hole is to escape the %
wildcard character. (For anyone not familiar with the escape clause on the operator, here's a link to the SQL Server documentation.
select ...
where '|peanut|butter|'
like '%|' + 'pe\%ter' + '|%' escape '\'
Now we can match the literal %. Of course, when we have a column name, we're going to need to dynamically escape the wildcard. We can use the REPLACE
function to find occurrences of the %
character and insert a backslash character in front of each one, like this:
select ...
where '|pe%ter|'
like '%|' + REPLACE( 'pe%ter' ,'%','\%') + '|%' escape '\'
So that solves the problem with the % wildcard. Almost.
Escape the escape
We recognize that our solution has introduced another problem. The escape character. We see that we're also going to need to escape any occurrences of escape character itself. This time, we use the ! as the escape character:
select ...
where '|pe%t!r|'
like '%|' + REPLACE(REPLACE( 'pe%t!r' ,'!','!!'),'%','!%') + '|%' escape '!'
The underscore too
Now that we're on a roll, we can add another REPLACE
handle the underscore wildcard. And just for fun, this time, we'll use $ as the escape character.
select ...
where '|p_%t!r|'
like '%|' + REPLACE(REPLACE(REPLACE( 'p_%t!r' ,'$','$$'),'%','$%'),'_','$_') + '|%' escape '$'
I prefer this approach to escaping because it works in Oracle and MySQL as well as SQL Server. (I usually use the \ backslash as the escape character, since that's the character we use in regular expressions. But why be constrained by convention!
Those pesky brackets
SQL Server also allows for wildcard characters to be treated as literals by enclosing them in brackets []
. So we're not done fixing yet, at least for SQL Server. Since pairs of brackets have special meaning, we'll need to escape those as well. If we manage to properly escape the brackets, then at least we won't have to bother with the hyphen -
and the carat ^
within the brackets. And we can leave any %
and _
characters inside the brackets escaped, since we'll have basically disabled the special meaning of the brackets.
Finding matching pairs of brackets shouldn't be that hard. It's a little more difficult than handling the occurrences of singleton % and _. (Note that it's not sufficient to just escape all occurrences of brackets, because a singleton bracket is considered to be a literal, and doesn't need to be escaped. The logic is getting a little fuzzier than I can handle without running more test cases.)
Inline expression gets messy
That inline expression in the SQL is getting longer and uglier. We can probably make it work, but heaven help the poor soul that comes behind and has to decipher it. As much of a fan I am for inline expressions, I'm inclined not use one here, mainly because I don't want to have to leave a comment explaining the reason for the mess, and apologizing for it.
A function where ?
Okay, so, if we don't handle that as an inline expression in the SQL, the closest alternative we have is a user defined function. And we know that won't speed things up any (unless we can define an index on it, like we could with Oracle.) If we've got to create a function, we might better do that in the code that calls the SQL statement.
And that function may have some differences in behavior, dependent on the DBMS and version. (A shout out to all you Java developers so keen on being able to use any database engine interchangeably.)
Domain knowledge
We may have specialized knowledge of the domain for the column, (that is, the set of allowable values enforced for the column. We may know a priori that the values stored in the column will never contain a percent sign, an underscore, or bracket pairs. In that case, we just include a quick comment that those cases are covered.
The values stored in the column may allow for % or _ characters, but a constraint may require those values to be escaped, perhaps using a defined character, such that the values are LIKE comparison "safe". Again, a quick comment about the allowed set of values, and in particular which character is used as an escape character, and go with Joel Spolsky's approach.
But, absent the specialized knowledge and a guarantee, it's important for us to at least consider handling those obscure corner cases, and consider whether the behavior is reasonable and "per the specification".
Other issues recapitulated
I believe others have already sufficiently pointed out some of the other commonly considered areas of concern:
SQL injection (taking what would appear to be user supplied information, and including that in the SQL text rather than supplying them through bind variables. Using bind variables isn't required, it's just one convenient approach to thwart with SQL injection. There are other ways to deal with it:
optimizer plan using index scan rather than index seeks, possible need for an expression or function for escaping wildcards (possible index on expression or function)
using literal values in place of bind variables impacts scalability
Conclusion
I like Joel Spolsky's approach. It's clever. And it works.
But as soon as I saw it, I immediately saw a potential problem with it, and it's not my nature to let it slide. I don't mean to be critical of the efforts of others. I know many developers take their work very personally, because they invest so much into it and they care so much about it. So please understand, this is not a personal attack. What I'm identifying here is the type of problem that crops up in production rather than testing.
Upvotes: 193
Reputation: 85625
For SQL Server 2008, you can use a table valued parameter. It's a bit of work, but it is arguably cleaner than my other method.
First, you have to create a type
CREATE TYPE dbo.TagNamesTableType AS TABLE ( Name nvarchar(50) )
Then, your ADO.NET code looks like this:
string[] tags = new string[] { "ruby", "rails", "scruffy", "rubyonrails" };
cmd.CommandText = "SELECT Tags.* FROM Tags JOIN @tagNames as P ON Tags.Name = P.Name";
// value must be IEnumerable<SqlDataRecord>
cmd.Parameters.AddWithValue("@tagNames", tags.AsSqlDataRecord("Name")).SqlDbType = SqlDbType.Structured;
cmd.Parameters["@tagNames"].TypeName = "dbo.TagNamesTableType";
// Extension method for converting IEnumerable<string> to IEnumerable<SqlDataRecord>
public static IEnumerable<SqlDataRecord> AsSqlDataRecord(this IEnumerable<string> values, string columnName) {
if (values == null || !values.Any()) return null; // Annoying, but SqlClient wants null instead of 0 rows
var firstRecord = values.First();
var metadata= new SqlMetaData(columnName, SqlDbType.NVarChar, 50); //50 as per SQL Type
return values.Select(v =>
{
var r = new SqlDataRecord(metadata);
r.SetValues(v);
return r;
});
}
Update As Per @Doug
Please try to avoid var metadata = SqlMetaData.InferFromValue(firstRecord, columnName);
It's set first value length, so if first value is 3 characters then its set max length 3 and other records will truncated if more then 3 characters.
So, please try to use: var metadata= new SqlMetaData(columnName, SqlDbType.NVarChar, maxLen);
Note: -1
for max length.
Upvotes: 265
Reputation: 175556
In SQL Server 2016+
you could use STRING_SPLIT
function:
DECLARE @names NVARCHAR(MAX) = 'ruby,rails,scruffy,rubyonrails';
SELECT *
FROM Tags
WHERE Name IN (SELECT [value] FROM STRING_SPLIT(@names, ','))
ORDER BY [Count] DESC;
or:
DECLARE @names NVARCHAR(MAX) = 'ruby,rails,scruffy,rubyonrails';
SELECT t.*
FROM Tags t
JOIN STRING_SPLIT(@names,',')
ON t.Name = [value]
ORDER BY [Count] DESC;
The accepted answer will of course work and it is one of the way to go, but it is anti-pattern.
E. Find rows by list of values
This is replacement for common anti-pattern such as creating a dynamic SQL string in application layer or Transact-SQL, or by using LIKE operator:
SELECT ProductId, Name, Tags FROM Product WHERE ',1,2,3,' LIKE '%,' + CAST(ProductId AS VARCHAR(20)) + ',%';
Addendum:
To improve the STRING_SPLIT
table function row estimation, it is a good idea to materialize splitted values as temporary table/table variable:
DECLARE @names NVARCHAR(MAX) = 'ruby,rails,scruffy,rubyonrails,sql';
CREATE TABLE #t(val NVARCHAR(120));
INSERT INTO #t(val) SELECT s.[value] FROM STRING_SPLIT(@names, ',') s;
SELECT *
FROM Tags tg
JOIN #t t
ON t.val = tg.TagName
ORDER BY [Count] DESC;
Related: How to Pass a List of Values Into a Stored Procedure
SQL Server 2008
. Because this question is often used as duplicate, I've added this answer as reference.
Upvotes: 54
Reputation: 1547
Step 1:-
string[] Ids = new string[] { "3", "6", "14" };
string IdsSP = string.Format("'|{0}|'", string.Join("|", Ids));
Step 2:-
@CurrentShipmentStatusIdArray [nvarchar](255) = NULL
Step 3:-
Where @CurrentShipmentStatusIdArray is null or @CurrentShipmentStatusIdArray LIKE '%|' + convert(nvarchar,Shipments.CurrentShipmentStatusId) + '|%'
or
Where @CurrentShipmentStatusIdArray is null or @CurrentShipmentStatusIdArray LIKE '%|' + Shipments.CurrentShipmentStatusId+ '|%'
Upvotes: 1
Reputation: 117
In SQL SERVER 2016 or higher you can use STRING_SPLIT.
DECLARE @InParaSeprated VARCHAR(MAX) = 'ruby,rails,scruffy,rubyonrails'
DECLARE @Delimeter VARCHAR(10) = ','
SELECT
*
FROM
Tags T
INNER JOIN STRING_SPLIT(@InputParameters,@Delimeter) SS ON T.Name = SS.value
ORDER BY
Count DESC
I use this because some times join faster than Like Operator works in my queries.
In addition you can put unlimited number of inputs in any separated format that you like.
I like this ..
Upvotes: 1
Reputation: 5471
You can do this in a reusable way by doing the following -
public static class SqlWhereInParamBuilder
{
public static string BuildWhereInClause<t>(string partialClause, string paramPrefix, IEnumerable<t> parameters)
{
string[] parameterNames = parameters.Select(
(paramText, paramNumber) => "@" + paramPrefix + paramNumber.ToString())
.ToArray();
string inClause = string.Join(",", parameterNames);
string whereInClause = string.Format(partialClause.Trim(), inClause);
return whereInClause;
}
public static void AddParamsToCommand<t>(this SqlCommand cmd, string paramPrefix, IEnumerable<t> parameters)
{
string[] parameterValues = parameters.Select((paramText) => paramText.ToString()).ToArray();
string[] parameterNames = parameterValues.Select(
(paramText, paramNumber) => "@" + paramPrefix + paramNumber.ToString()
).ToArray();
for (int i = 0; i < parameterNames.Length; i++)
{
cmd.Parameters.AddWithValue(parameterNames[i], parameterValues[i]);
}
}
}
For more details have a look at this blog post - Parameterized SQL WHERE IN clause c#
Upvotes: 4
Reputation: 1323
If we have strings stored inside the IN clause with the comma(,) delimited, we can use the charindex function to get the values. If you use .NET, then you can map with SqlParameters.
DDL Script:
CREATE TABLE Tags
([ID] int, [Name] varchar(20))
;
INSERT INTO Tags
([ID], [Name])
VALUES
(1, 'ruby'),
(2, 'rails'),
(3, 'scruffy'),
(4, 'rubyonrails')
;
T-SQL:
DECLARE @Param nvarchar(max)
SET @Param = 'ruby,rails,scruffy,rubyonrails'
SELECT * FROM Tags
WHERE CharIndex(Name,@Param)>0
You can use the above statement in your .NET code and map the parameter with SqlParameter.
EDIT: Create the table called SelectedTags using the following script.
DDL Script:
Create table SelectedTags
(Name nvarchar(20));
INSERT INTO SelectedTags values ('ruby'),('rails')
T-SQL:
DECLARE @list nvarchar(max)
SELECT @list=coalesce(@list+',','')+st.Name FROM SelectedTags st
SELECT * FROM Tags
WHERE CharIndex(Name,@Param)>0
Upvotes: 10
Reputation: 33637
Here's a quick-and-dirty technique I have used:
SELECT * FROM Tags
WHERE '|ruby|rails|scruffy|rubyonrails|'
LIKE '%|' + Name + '|%'
So here's the C# code:
string[] tags = new string[] { "ruby", "rails", "scruffy", "rubyonrails" };
const string cmdText = "select * from tags where '|' + @tags + '|' like '%|' + Name + '|%'";
using (SqlCommand cmd = new SqlCommand(cmdText)) {
cmd.Parameters.AddWithValue("@tags", string.Join("|", tags);
}
Two caveats:
LIKE "%...%"
queries are not indexed.|
, blank, or null tags or this won't workThere are other ways to accomplish this that some people may consider cleaner, so please keep reading.
Upvotes: 326
Reputation: 129
Create a temp table where names are stored, and then use the following query:
select * from Tags
where Name in (select distinct name from temp)
order by Count desc
Upvotes: 1
Reputation: 1062492
If you are calling from .NET, you could use Dapper dot net:
string[] names = new string[] {"ruby","rails","scruffy","rubyonrails"};
var tags = dataContext.Query<Tags>(@"
select * from Tags
where Name in @names
order by Count desc", new {names});
Here Dapper does the thinking, so you don't have to. Something similar is possible with LINQ to SQL, of course:
string[] names = new string[] {"ruby","rails","scruffy","rubyonrails"};
var tags = from tag in dataContext.Tags
where names.Contains(tag.Name)
orderby tag.Count descending
select tag;
Upvotes: 54
Reputation: 2798
There is a nice, simple and tested way of doing that:
/* Create table-value string: */
CREATE TYPE [String_List] AS TABLE ([Your_String_Element] varchar(max) PRIMARY KEY);
GO
/* Create procedure which takes this table as parameter: */
CREATE PROCEDURE [dbo].[usp_ListCheck]
@String_List_In [String_List] READONLY
AS
SELECT a.*
FROM [dbo].[Tags] a
JOIN @String_List_In b ON a.[Name] = b.[Your_String_Element];
I have started using this method to fix the issues we had with the entity framework (was not robust enough for our application). So we decided to give the Dapper (same as Stack) a chance. Also specifying your string list as table with PK column fix your execution plans a lot. Here is a good article of how to pass a table into Dapper - all fast and CLEAN.
Upvotes: 2
Reputation: 16252
This is a reusable variation of the solution in Mark Bracket's excellent answer.
Extension Method:
public static class ParameterExtensions
{
public static Tuple<string, SqlParameter[]> ToParameterTuple<T>(this IEnumerable<T> values)
{
var createName = new Func<int, string>(index => "@value" + index.ToString());
var paramTuples = values.Select((value, index) =>
new Tuple<string, SqlParameter>(createName(index), new SqlParameter(createName(index), value))).ToArray();
var inClause = string.Join(",", paramTuples.Select(t => t.Item1));
var parameters = paramTuples.Select(t => t.Item2).ToArray();
return new Tuple<string, SqlParameter[]>(inClause, parameters);
}
}
Usage:
string[] tags = {"ruby", "rails", "scruffy", "rubyonrails"};
var paramTuple = tags.ToParameterTuple();
var cmdText = $"SELECT * FROM Tags WHERE Name IN ({paramTuple.Item1})";
using (var cmd = new SqlCommand(cmdText))
{
cmd.Parameters.AddRange(paramTuple.Item2);
}
Upvotes: 4
Reputation: 6395
Here is another alternative. Just pass a comma-delimited list as a string parameter to the stored procedure and:
CREATE PROCEDURE [dbo].[sp_myproc]
@UnitList varchar(MAX) = '1,2,3'
AS
select column from table
where ph.UnitID in (select * from CsvToInt(@UnitList))
And the function:
CREATE Function [dbo].[CsvToInt] ( @Array varchar(MAX))
returns @IntTable table
(IntValue int)
AS
begin
declare @separator char(1)
set @separator = ','
declare @separator_position int
declare @array_value varchar(MAX)
set @array = @array + ','
while patindex('%,%' , @array) <> 0
begin
select @separator_position = patindex('%,%' , @array)
select @array_value = left(@array, @separator_position - 1)
Insert @IntTable
Values (Cast(@array_value as int))
select @array = stuff(@array, 1, @separator_position, '')
end
return
end
Upvotes: 9
Reputation: 452947
In SQL Server 2016+ another possibility is to use the OPENJSON
function.
This approach is blogged about in OPENJSON - one of best ways to select rows by list of ids.
A full worked example below
CREATE TABLE dbo.Tags
(
Name VARCHAR(50),
Count INT
)
INSERT INTO dbo.Tags
VALUES ('VB',982), ('ruby',1306), ('rails',1478), ('scruffy',1), ('C#',1784)
GO
CREATE PROC dbo.SomeProc
@Tags VARCHAR(MAX)
AS
SELECT T.*
FROM dbo.Tags T
WHERE T.Name IN (SELECT J.Value COLLATE Latin1_General_CI_AS
FROM OPENJSON(CONCAT('[', @Tags, ']')) J)
ORDER BY T.Count DESC
GO
EXEC dbo.SomeProc @Tags = '"ruby","rails","scruffy","rubyonrails"'
DROP TABLE dbo.Tags
Upvotes: 7
Reputation: 1324
(Edit: If table valued parameters are not available) Best seems to be to split a large number of IN parameters into multiple queries with fixed length, so you have a number of known SQL statements with fixed parameter count and no dummy/duplicate values, and also no parsing of strings, XML and the like.
Here's some code in C# I wrote on this topic:
public static T[][] SplitSqlValues<T>(IEnumerable<T> values)
{
var sizes = new int[] { 1000, 500, 250, 125, 63, 32, 16, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 };
int processed = 0;
int currSizeIdx = sizes.Length - 1; /* start with last (smallest) */
var splitLists = new List<T[]>();
var valuesDistSort = values.Distinct().ToList(); /* remove redundant */
valuesDistSort.Sort();
int totalValues = valuesDistSort.Count;
while (totalValues > sizes[currSizeIdx] && currSizeIdx > 0)
currSizeIdx--; /* bigger size, by array pos. */
while (processed < totalValues)
{
while (totalValues - processed < sizes[currSizeIdx])
currSizeIdx++; /* smaller size, by array pos. */
var partList = new T[sizes[currSizeIdx]];
valuesDistSort.CopyTo(processed, partList, 0, sizes[currSizeIdx]);
splitLists.Add(partList);
processed += sizes[currSizeIdx];
}
return splitLists.ToArray();
}
(you may have further ideas, omit the sorting, use valuesDistSort.Skip(processed).Take(size[...]) instead of list/array CopyTo).
When inserting parameter variables, you create something like:
foreach(int[] partList in splitLists)
{
/* here: question mark for param variable, use named/numbered params if required */
string sql = "select * from Items where Id in("
+ string.Join(",", partList.Select(p => "?"))
+ ")"; /* comma separated ?, one for each partList entry */
/* create command with sql string, set parameters, execute, merge results */
}
I've watched the SQL generated by the NHibernate object-relational mapper (when querying data to create objects from), and that looks best with multiple queries. In NHibernate, one can specify a batch-size; if many object data rows have to be fetched, it tries to retrieve the number of rows equivalent to the batch-size
SELECT * FROM MyTable WHERE Id IN (@p1, @p2, @p3, ... , @p[batch-size])
,instead of sending hundreds or thousands of
SELECT * FROM MyTable WHERE Id=@id
When the remaining IDs are less then batch-size, but still more than one, it splits into smaller statements, but still with certain length.
If you have a batch size of 100, and a query with 118 parameters, it would create 3 queries:
but none with 118 or 18. This way, it restricts the possible SQL statements to likely known statements, preventing too many different, thus too many query plans, which fill the cache and in great parts never get reused. The above code does the same, but with lengths 1000, 500, 250, 125, 63, 32, 16, 10-to-1. Parameter lists with more than 1000 elements are also split, preventing a database error due to a size limit.
Anyway, it's best to have a database interface which sends parameterized SQL directly, without a separate Prepare statement and handle to call. Databases like SQL Server and Oracle remember SQL by string equality (values change, binding params in SQL not!) and reuse query plans, if available. No need for separate prepare statements, and tedious maintenance of query handles in code! ADO.NET works like this, but it seems like Java still uses prepare/execute by handle (not sure).
I had my own question on this topic, originally suggesting to fill the IN clause with duplicates, but then preferring the NHibernate style statement split: Parameterized SQL - in / not in with fixed numbers of parameters, for query plan cache optimization?
This question is still interesting, even more than 5 years after being asked...
EDIT: I noted that IN queries with many values (like 250 or more) still tend to be slow, in the given case, on SQL Server. While I expected the DB to create a kind of temporary table internally and join against it, it seemed like it only repeated the single value SELECT expression n-times. Time was up to about 200ms per query - even worse than joining the original IDs retrieval SELECT against the other, related tables.. Also, there were some 10 to 15 CPU units in SQL Server Profiler, something unusual for repeated execution of the same parameterized queries, suggesting that new query plans were created on repeated calls. Maybe ad-hoc like individual queries are not worse at all. I had to compare these queries to non-split queries with changing sizes for a final conclusion, but for now, it seems like long IN clauses should be avoided anyway.
Upvotes: 4
Reputation: 486
I'd approach this by default with passing a table valued function (that returns a table from a string) to the IN condition.
Here is the code for the UDF (I got it from Stack Overflow somewhere, i can't find the source right now)
CREATE FUNCTION [dbo].[Split] (@sep char(1), @s varchar(8000))
RETURNS table
AS
RETURN (
WITH Pieces(pn, start, stop) AS (
SELECT 1, 1, CHARINDEX(@sep, @s)
UNION ALL
SELECT pn + 1, stop + 1, CHARINDEX(@sep, @s, stop + 1)
FROM Pieces
WHERE stop > 0
)
SELECT
SUBSTRING(@s, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS s
FROM Pieces
)
Once you got this your code would be as simple as this:
select * from Tags
where Name in (select s from dbo.split(';','ruby;rails;scruffy;rubyonrails'))
order by Count desc
Unless you have a ridiculously long string, this should work well with the table index.
If needed you can insert it into a temp table, index it, then run a join...
Upvotes: 9
Reputation: 57344
This is possibly a half nasty way of doing it, I used it once, was rather effective.
Depending on your goals it might be of use.
INSERT
each look-up value into that column. IN
, you can then just use your standard JOIN
rules. ( Flexibility++ )This has a bit of added flexibility in what you can do, but it's more suited for situations where you have a large table to query, with good indexing, and you want to use the parametrized list more than once. Saves having to execute it twice and have all the sanitation done manually.
I never got around to profiling exactly how fast it was, but in my situation it was needed.
Upvotes: 34
Reputation: 35696
If you've got SQL Server 2008 or later I'd use a Table Valued Parameter.
If you're unlucky enough to be stuck on SQL Server 2005 you could add a CLR function like this,
[SqlFunction(
DataAccessKind.None,
IsDeterministic = true,
SystemDataAccess = SystemDataAccessKind.None,
IsPrecise = true,
FillRowMethodName = "SplitFillRow",
TableDefinintion = "s NVARCHAR(MAX)"]
public static IEnumerable Split(SqlChars seperator, SqlString s)
{
if (s.IsNull)
return new string[0];
return s.ToString().Split(seperator.Buffer);
}
public static void SplitFillRow(object row, out SqlString s)
{
s = new SqlString(row.ToString());
}
Which you could use like this,
declare @desiredTags nvarchar(MAX);
set @desiredTags = 'ruby,rails,scruffy,rubyonrails';
select * from Tags
where Name in [dbo].[Split] (',', @desiredTags)
order by Count desc
Upvotes: 11
Reputation: 57872
I use a more concise version of the top voted answer:
List<SqlParameter> parameters = tags.Select((s, i) => new SqlParameter("@tag" + i.ToString(), SqlDbType.NVarChar(50)) { Value = s}).ToList();
var whereCondition = string.Format("tags in ({0})", String.Join(",",parameters.Select(s => s.ParameterName)));
It does loop through the tag parameters twice; but that doesn't matter most of the time (it won't be your bottleneck; if it is, unroll the loop).
If you're really interested in performance and don't want to iterate through the loop twice, here's a less beautiful version:
var parameters = new List<SqlParameter>();
var paramNames = new List<string>();
for (var i = 0; i < tags.Length; i++)
{
var paramName = "@tag" + i;
//Include size and set value explicitly (not AddWithValue)
//Because SQL Server may use an implicit conversion if it doesn't know
//the actual size.
var p = new SqlParameter(paramName, SqlDbType.NVarChar(50) { Value = tags[i]; }
paramNames.Add(paramName);
parameters.Add(p);
}
var inClause = string.Join(",", paramNames);
Upvotes: 6
Reputation: 2880
Use a dynamic query. The front end is only to generate the required format:
DECLARE @invalue VARCHAR(100)
SELECT @invalue = '''Bishnu'',''Gautam'''
DECLARE @dynamicSQL VARCHAR(MAX)
SELECT @dynamicSQL = 'SELECT * FROM #temp WHERE [name] IN (' + @invalue + ')'
EXEC (@dynamicSQL)
Upvotes: 2
Reputation: 3116
Use the following stored procedure. It uses a custom split function, which can be found here.
create stored procedure GetSearchMachingTagNames
@PipeDelimitedTagNames varchar(max),
@delimiter char(1)
as
begin
select * from Tags
where Name in (select data from [dbo].[Split](@PipeDelimitedTagNames,@delimiter)
end
Upvotes: 8
Reputation: 331
create FUNCTION [dbo].[ConvertStringToList]
(@str VARCHAR (MAX), @delimeter CHAR (1))
RETURNS
@result TABLE (
[ID] INT NULL)
AS
BEG
IN
DECLARE @x XML
SET @x = '<t>' + REPLACE(@str, @delimeter, '</t><t>') + '</t>'
INSERT INTO @result
SELECT DISTINCT x.i.value('.', 'int') AS token
FROM @x.nodes('//t') x(i)
ORDER BY 1
RETURN
END
--YOUR QUERY
select * from table where id in ([dbo].[ConvertStringToList(YOUR comma separated string ,',')])
Upvotes: 3
Reputation: 10046
We have function that creates a table variable that you can join to:
ALTER FUNCTION [dbo].[Fn_sqllist_to_table](@list AS VARCHAR(8000),
@delim AS VARCHAR(10))
RETURNS @listTable TABLE(
Position INT,
Value VARCHAR(8000))
AS
BEGIN
DECLARE @myPos INT
SET @myPos = 1
WHILE Charindex(@delim, @list) > 0
BEGIN
INSERT INTO @listTable
(Position,Value)
VALUES (@myPos,LEFT(@list, Charindex(@delim, @list) - 1))
SET @myPos = @myPos + 1
IF Charindex(@delim, @list) = Len(@list)
INSERT INTO @listTable
(Position,Value)
VALUES (@myPos,'')
SET @list = RIGHT(@list, Len(@list) - Charindex(@delim, @list))
END
IF Len(@list) > 0
INSERT INTO @listTable
(Position,Value)
VALUES (@myPos,@list)
RETURN
END
So:
@Name varchar(8000) = null // parameter for search values
select * from Tags
where Name in (SELECT value From fn_sqllist_to_table(@Name,',')))
order by Count desc
Upvotes: 25
Reputation: 20758
Here's a cross-post to a solution to the same problem. More robust than reserved delimiters - includes escaping and nested arrays, and understands NULLs and empty arrays.
C# & T-SQL string[] Pack/Unpack utility functions
You can then join to the table-valued function.
Upvotes: 3
Reputation: 4797
Here is another answer to this problem.
(new version posted on 6/4/13).
private static DataSet GetDataSet(SqlConnectionStringBuilder scsb, string strSql, params object[] pars)
{
var ds = new DataSet();
using (var sqlConn = new SqlConnection(scsb.ConnectionString))
{
var sqlParameters = new List<SqlParameter>();
var replacementStrings = new Dictionary<string, string>();
if (pars != null)
{
for (int i = 0; i < pars.Length; i++)
{
if (pars[i] is IEnumerable<object>)
{
List<object> enumerable = (pars[i] as IEnumerable<object>).ToList();
replacementStrings.Add("@" + i, String.Join(",", enumerable.Select((value, pos) => String.Format("@_{0}_{1}", i, pos))));
sqlParameters.AddRange(enumerable.Select((value, pos) => new SqlParameter(String.Format("@_{0}_{1}", i, pos), value ?? DBNull.Value)).ToArray());
}
else
{
sqlParameters.Add(new SqlParameter(String.Format("@{0}", i), pars[i] ?? DBNull.Value));
}
}
}
strSql = replacementStrings.Aggregate(strSql, (current, replacementString) => current.Replace(replacementString.Key, replacementString.Value));
using (var sqlCommand = new SqlCommand(strSql, sqlConn))
{
if (pars != null)
{
sqlCommand.Parameters.AddRange(sqlParameters.ToArray());
}
else
{
//Fail-safe, just in case a user intends to pass a single null parameter
sqlCommand.Parameters.Add(new SqlParameter("@0", DBNull.Value));
}
using (var sqlDataAdapter = new SqlDataAdapter(sqlCommand))
{
sqlDataAdapter.Fill(ds);
}
}
}
return ds;
}
Cheers.
Upvotes: 5
Reputation: 109
Here's a technique that recreates a local table to be used in a query string. Doing it this way eliminates all parsing problems.
The string can be built in any language. In this example I used SQL since that was the original problem I was trying to solve. I needed a clean way to pass in table data on the fly in a string to be executed later.
Using a user defined type is optional. Creating the type is only created once and can be done ahead of time. Otherwise just add a full table type to the declaration in the string.
The general pattern is easy to extend and can be used for passing more complex tables.
-- Create a user defined type for the list.
CREATE TYPE [dbo].[StringList] AS TABLE(
[StringValue] [nvarchar](max) NOT NULL
)
-- Create a sample list using the list table type.
DECLARE @list [dbo].[StringList];
INSERT INTO @list VALUES ('one'), ('two'), ('three'), ('four')
-- Build a string in which we recreate the list so we can pass it to exec
-- This can be done in any language since we're just building a string.
DECLARE @str nvarchar(max);
SET @str = 'DECLARE @list [dbo].[StringList]; INSERT INTO @list VALUES '
-- Add all the values we want to the string. This would be a loop in C++.
SELECT @str = @str + '(''' + StringValue + '''),' FROM @list
-- Remove the trailing comma so the query is valid sql.
SET @str = substring(@str, 1, len(@str)-1)
-- Add a select to test the string.
SET @str = @str + '; SELECT * FROM @list;'
-- Execute the string and see we've pass the table correctly.
EXEC(@str)
Upvotes: 7
Reputation: 37709
I would pass a table type parameter (since it's SQL Server 2008), and do a where exists
, or inner join. You may also use XML, using sp_xml_preparedocument
, and then even index that temporary table.
Upvotes: 20
Reputation: 26051
I heard Jeff/Joel talk about this on the podcast today (episode 34, 2008-12-16 (MP3, 31 MB), 1 h 03 min 38 secs - 1 h 06 min 45 secs), and I thought I recalled Stack Overflow was using LINQ to SQL, but maybe it was ditched. Here's the same thing in LINQ to SQL.
var inValues = new [] { "ruby","rails","scruffy","rubyonrails" };
var results = from tag in Tags
where inValues.Contains(tag.Name)
select tag;
That's it. And, yes, LINQ already looks backwards enough, but the Contains
clause seems extra backwards to me. When I had to do a similar query for a project at work, I naturally tried to do this the wrong way by doing a join between the local array and the SQL Server table, figuring the LINQ to SQL translator would be smart enough to handle the translation somehow. It didn't, but it did provide an error message that was descriptive and pointed me towards using Contains.
Anyway, if you run this in the highly recommended LINQPad, and run this query, you can view the actual SQL that the SQL LINQ provider generated. It'll show you each of the values getting parameterized into an IN
clause.
Upvotes: 68