christiaantober
christiaantober

Reputation: 341

DbQuery behaves differently in a foreach loop. Why?

If I use the following code, I get a list of students that study course 1 and course 2. (This is almost what I want.)

IQueryable<Student> filteredStudents = context.Students;
filteredStudents = filteredStudents
    .Where(s => s.Courses.Select(c => c.CourseID).Contains(1));
filteredStudents = filteredStudents
    .Where(s => s.Courses.Select(c => c.CourseID).Contains(2));
List<Student> studentList = filteredStudents.ToList<Student>();  

However, if I try and do this in a foreach loop (as shown in the following code) then I get a list of all students that are signed up for the last course in the loop.

IQueryable<Student> filteredStudents = context.Students;
foreach (Course course in filter.Courses) {
    if (course != null) {             
        filteredStudents = filteredStudents
            .Where(s => s.Courses.Select(c => c.CourseID).Contains(course.CourseID));
    }
}
List<Student> studentList = filteredStudents.ToList<Student>();

This behaviour confuses me. Can anyone explain why this is happening? And how to get around it? Thank you.

Upvotes: 3

Views: 446

Answers (4)

Joel Coehoorn
Joel Coehoorn

Reputation: 415735

The problem is that the foreach loop only creates a single course variable for all of the loop iterations, and this single variable is then captured into a closure. Also remember that the filters aren't actually executed until after the loop. Put those together, and by the time the filters execute this single course variable has advanced to the last item in the Courses filter; you only check against that one last course.

I see four ways to fix the problem.

First

Create a new variable for each iteration of the loop (this is probably your best quick fix)

IQueryable<Student> filteredStudents = context.Students;
foreach (Course course in filter.Courses) {
    if (course != null) {  
        int CourseID = course.CourseID;            
        filteredStudents = filteredStudents
            .Where(s => s.Courses.Select(c => c.CourseID).Contains(CourseID));
    }
}
List<Student> studentList = filteredStudents.ToList<Student>();

Second

Resolve the IEnumerable expression inside the loop (probably much less efficient):

IEnumerable<Student> filteredStudents = context.Students;
foreach (Course course in filter.Courses) {
    if (course != null) {             
        filteredStudents = filteredStudents
            .Where(s => s.Courses.Select(c => c.CourseID).Contains(course.CourseID))
            .ToList(); 
    }
}
List<Student> studentList = filteredStudents.ToList<Student>();

Third

Use more appropriate linq operators/lambda expressions to eliminate the foreach loop:

var studentList = context.Students.Where(s => s.Courses.Select(c => c.CourseID).Intersect(filter.Courses.Select(c => c.CourseID)).Any()).ToList();

Or in a more readable way:

IQueryable<Student> filteredStudents = context.Students;
var courses = filter.Courses.Select(c => c.CourseID);
var studentList = filteredStudents
       .Where(s => s.Courses.Select(c => c.CourseID)
                       .Intersect(courses)
                       .Any()
       ).ToList();

If you play with this a bit, the performance should meet or far exceed the foreach loop through clever internal use of HashSets or — if you're very lucky — by sending a JOIN query to the DB). Just be careful, because it's easy to write something here that make numerous "extra" calls out to the DB inside that Intersect() or Any() method. Even so, this is the option I tend to prefer, with the exception that I probably wouldn't bother to call .ToList() at the end.

This also illustrates why I don't have much use for ORMs like Entity Framework, linq-to-sql, or even NHibernate or ActiveRecord. If I'm just writing SQL, I can know I'm getting the correct join query. I could do that with an ORM, too, but now I still need to know about the specific SQL I'm creating, and I also have to know how to get the ORM to do what I want.

Fourth

Use C# 5.0. This is fixed in the most recent version of C#, so that each iteration of a for/foreach loop is its own variable.

Upvotes: 4

jjj
jjj

Reputation: 4997

If you're trying to get every Student that is enrolled in every course in filter.Courses, you could try:

var courseIDs = filter.Courses.Select(c => c.CourseID);
var filteredStudents = context.Students
    .Where(s => !courseIDs.Except(s.Courses.Select(c => c.CourseId)).Any())

which filters on whether courseIDs is a subset of a Student's course IDs.

Edit

Joel Coehoorn and Mikael Eliasson give a good explanation for why all of the students in the last course were retrieved.

Upvotes: 1

Mikael Eliasson
Mikael Eliasson

Reputation: 5227

I don't think this has to do with Entity Framework. It's a bug (not really, but a stupid design in c#) where the variable is declared outside the loop.

In this case it means that because the IEnumerable is lazily evaluated it will use the LAST value of the variable. Use a temp variable in the loop to solve it.

foreach (Course course in filter.Courses) {
    if (course != null) {
        var cId = course.CourseID;       
        filteredStudents = filteredStudents
            .Where(s => s.Courses.Select(c => c.CourseID).Contains(cId))
                .Select(s => s);
    }
}

Or even better if you have the navigation properties properly defined. Just do:

var studentList = filter.Courses.SelectMany(c => c.Students).ToList()

See more info here: Is there a reason for C#'s reuse of the variable in a foreach?

Upvotes: 0

Christopher
Christopher

Reputation: 134

Because "filteredStudents = filteredStudents.Where..." is a straight assignment to the variable, each time through the loop you are completely replacing what was in it before. you need to append to it, not replace. Try a search for "c# AddRange"

Upvotes: 0

Related Questions