Reputation: 592
Basically what I'm trying to do is take a list of objects and filter it based on a number of criteria where one of the criteria is that the key doesn't exist in another list. Here's an example:
My two classes are similar to this:
public class Test
{
public string name;
public string instructor_name;
public string course;
}
public class Appointment
{
public string site;
public DateTime forWhen;
public string testName;
}
I want to sort through a List<Test> by looking at the course and making sure that test doesn't exist in the List<Appointment>. In SQL I'd do it something like this:
SELECT new Group<Test>(c.Key, c)
FROM tests in testList
WHERE tests.Course != "Science"
AND tests.name NOT IN (SELECT testName FROM appotList)
However, I can't figure out how I would do this in LINQ. Any ideas?
Upvotes: 8
Views: 20079
Reputation: 185593
If you're talking about performing client-side filtering, that's pretty easy with LINQ to Objects. Something like this:
List<Test> tests = ...;
List<Appointment> appts = ...;
var query = tests.Except(
tests.Join(appts, t => t.name, a => a.testName, (t, a) => t));
The following is slightly simpler to read:
var query = tests.Where(t => !appts.Any(a => a.testName == t.name));
But the first version will be faster, since the Join
function will compute a hash table of the matches rather than doing a linear search of the appts
list for every element in tests
.
Upvotes: 12
Reputation: 46008
http://introducinglinq.com/blogs/marcorusso/archive/2008/01/14/the-not-in-clause-in-linq-to-sql.aspx
Consider this code that returns all the customers who don't have an order in the Orders table. This is one SQL query that returns that value.
SELECT *
FROM [dbo].[Customers] AS [t0]
WHERE [t0].[CustomerID] NOT IN (
SELECT [t1].[CustomerID]
FROM [dbo].[Orders] AS [t1]
)
This is not the faster way to get the desired result (using a NOT EXISTS is the favorite way - more on this shortly). LINQ offers a Contains extension method that allows writing the following code.
NorthwindDataContext dc = new NorthwindDataContext();
dc.Log = Console.Out;
var query =
from c in dc.Customers
where !(from o in dc.Orders
select o.CustomerID)
.Contains(c.CustomerID)
select c;
foreach (var c in query) Console.WriteLine( c );
In LINQ to SQL the query is translated into this SQL code:
SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName],
[t0].[ContactTitle], [t0].[Address], [t0].[City],
[t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
FROM [dbo].[Customers] AS [t0]
WHERE NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[Orders] AS [t1]
WHERE [t1].[CustomerID] = [t0].[CustomerID]
))
This approach is not only semantically equivalent, but also faster in execution. The following is the result with SET STATISTICS IO ON. The first result is for the hand-written query that use the NOT IN clause. The second result is for the LINQ to SQL generated query.
Upvotes: 6