Reputation: 153
I recently came across an issue where I was able to change the IEnumerable
object that I was iterating over in a foreach
loop. It's my understanding that in C#, you aren't supposed to be able to edit the list you're iterating over, but after some frustration, I found that this is exactly what was happening. I basically looped through a LINQ query and used the object IDs to make changes in the database on those objects and those changes affected the values in the .Where()
statement.
Does anybody have an explanation for this? It seems like the LINQ query re-runs every time it's iterated over
NOTE: The fix for this is adding .ToList()
after the .Where()
, but my question is why this issue is happening at all i.e. if it's a bug or something I'm unaware of
using System;
using System.Linq;
namespace MyTest {
class Program {
static void Main () {
var aArray = new string[] {
"a", "a", "a", "a"
};
var i = 3;
var linqObj = aArray.Where(x => x == "a");
foreach (var item in linqObj ) {
aArray[i] = "b";
i--;
}
foreach (var arrItem in aArray) {
Console.WriteLine(arrItem); //Why does this only print out 2 a's and 2 b's, rather than 4 b's?
}
Console.ReadKey();
}
}
}
This code is just a reproducible mockup, but I'd expect it to loop through 4 times and change all of the strings in aArray
into b's. However, it only loops through twice and turns the last two strings in aArray
into b's
EDIT: After some feedback and to be more concise, my main question here is this: "Why am I able to change what I'm looping over as I'm looping over it". Looks like the overwhelming answer is that LINQ does deferred execution, so it's re-evaluating as I'm looping through the LINQ IEnumerable.
EDIT 2: Actually looking through, it seems that everyone is concerned with the .Count()
function, thinking that is what the issue here is. However, you can comment out that line and I still have the issue of the LINQ object changing. I updated the code to reflect the main issue
Upvotes: 14
Views: 2511
Reputation: 3141
The explanation to your first question, why your LINQ query re-runs every time it's iterated over
is because of Linq
's deferred execution.
This line just declares the linq exrpession and does not execute it:
var linqLIST = aArray.Where(x => x == "a");
and this is where it gets executed:
foreach (var arrItem in aArray)
and
Console.WriteLine(linqList.Count());
An explict call ToList()
would run the Linq
expression immediately. Use it like this:
var linqList = aArray.Where(x => x == "a").ToList();
Regarding the edited question:
Of course, the Linq
expression is evaluated in every foreach iteration. The issue is not the Count()
, instead every call to the LINQ expression re-evaluates it. As mentioned above, enumerate it to a List
and iterate over the list.
Late edit:
Concerning @Eric Lippert's critique, I will also refer and go into detail for the rest of the OP's questions.
//Why does this only print out 2 a's and 2 b's, rather than 4 b's?
In the first loop iteration i = 3
, so after aArray[3] = "b";
your array will look like this:
{ "a", "a", "a", "b" }
In the second loop iteration i
(--) has now the value 2 and after executing aArray[i] = "b";
your array will be:
{ "a", "a", "b", "b" }
At this point, there are still a
's in your array but the LINQ
query returns IEnumerator.MoveNext() == false
and as such the loop reaches its exit condition because the IEnumerator
internally used, now reaches the third position in the index of the array and as the LINQ
is re-evaluated it doesn't match the where x == "a"
condition any more.
Why am I able to change what I'm looping over as I'm looping over it?
You are able to do so because the build in code analyser in Visual Studio
is not detecting that you modify the collection within the loop. At runtime the array is modified, changing the outcome of the LINQ
query but there is no handling in the implementation of the array iterator so no exception is thrown.
This missing handling seems by design, as arrays are of fixed size oposed to lists where such an exception is thrown at runtime.
Consider following example code which should be equivalent with your initial code example (before edit):
using System;
using System.Linq;
namespace MyTest {
class Program {
static void Main () {
var aArray = new string[] {
"a", "a", "a", "a"
};
var iterationList = aArray.Where(x => x == "a").ToList();
foreach (var item in iterationList)
{
var index = iterationList.IndexOf(item);
iterationList.Remove(item);
iterationList.Insert(index, "b");
}
foreach (var arrItem in aArray)
{
Console.WriteLine(arrItem);
}
Console.ReadKey();
}
}
}
This code will compile and iterate the loop once before throwing an System.InvalidOperationException
with the message:
Collection was modified; enumeration operation may not execute.
Now the reason why the List
implementation throws this error while enumerating it, is because it follows a basic concept: For
and Foreach
are iterative control flow statements that need to be deterministic at runtime. Furthermore the Foreach
statement is a C#
specific implementation of the iterator pattern, which defines an algorithm that implies sequential traversal and as such it would not change within the execution. Thus the List
implementation throws an exception when you modify the collection while enumerating it.
You found one of the ways to modify a loop while iterating it and re-eveluating it in each iteration. This is a bad design choice because you might run into an infinite loop if the LINQ
expression keeps changing the results and never meets an exit condition for the loop. This will make it hard to debug and will not be obvious when reading the code.
In contrast there is the while
control flow statement which is a conditional construct and is ment to be non-deterministic at runtime, having a specific exit condition that is expected to change while execution.
Consider this rewrite base on your example:
using System;
using System.Linq;
namespace MyTest {
class Program {
static void Main () {
var aArray = new string[] {
"a", "a", "a", "a"
};
bool arrayHasACondition(string x) => x == "a";
while (aArray.Any(arrayHasACondition))
{
var index = Array.FindIndex(aArray, arrayHasACondition);
aArray[index] = "b";
}
foreach (var arrItem in aArray)
{
Console.WriteLine(arrItem); //Why does this only print out 2 a's and 2 b's, rather than 4 b's?
}
Console.ReadKey();
}
}
}
I hope this should outline the technical background and explain your false expectations.
Upvotes: 10
Reputation: 110111
Enumerable.Where
returns an instance that represents a query definition. When it is enumerated*, the query is evaluted. foreach
allows you to work with each item at the time it is found by the query. The query is deferred, but it also pause-able/resume-able, by the enumeration mechanisms.
var aArray = new string[] { "a", "a", "a", "a" };
var i = 3;
var linqObj = aArray.Where(x => x == "a");
foreach (var item in linqObj )
{
aArray[i] = "b";
i--;
}
item="a", aArray[3]="b", i=2
item="a", aArray[2]="b", i=2
Note: is enumerated* : this means GetEnumerator and MoveNext are called. This does not mean that the query is fully evaluated and results held in a snapshot.
For further understanding, read up on yield return
and how to write a method that uses that language feature. If you do this, you'll understand what you need in order to write Enumerable.Where
Upvotes: 3
Reputation: 43545
You could upgrade the «avoid side-effects while enumerating an array» advice to a requirement, by utilizing the extension method below:
private static IEnumerable<T> DontMessWithMe<T>(this T[] source)
{
var copy = source.ToArray();
return source.Zip(copy, (x, y) =>
{
if (!EqualityComparer<T>.Default.Equals(x, y))
throw new InvalidOperationException(
"Array was modified; enumeration operation may not execute.");
return x;
});
}
Now chain this method to your query and watch what happens. 😃
var linqObj = aArray.DontMessWithMe().Where(x => x == "a");
Of course this comes with a cost. Now every time you enumerate the array, a copy is created. This is why I don't expect that anyone will use this extension, ever!
Upvotes: 2
Reputation: 660138
Why am I able to edit a LINQ list while iterating over it?
All of the answers that say that this is because of deferred "lazy" execution are wrong, in the sense that they do not adequately address the question that was asked: "Why am I able to edit a list while iterating over it?" Deferred execution explains why running the query twice gives different results, but does not address why the operation described in the question is possible.
The problem is actually that the original poster has a false belief:
I recently came across an issue where I was able to change the IEnumerable object that I was iterating over in a foreach loop. It's my understanding that in C#, you aren't supposed to be able to edit the list you're iterating over
Your understanding is wrong, and that's where the confusion comes from. The rule in C# is not "it is impossible to edit an enumerable from within an enumeration". The rule is you are not supposed to edit an enumerable from within an enumeration, and if you choose to do so, arbitrarily bad things can happen.
Basically what you're doing is running a stop sign and then asking "Running a stop sign is illegal, so why did the police not prevent me from running the stop sign?" The police are not required to prevent you from doing an illegal act; you are responsible for not making the attempt in the first place, and if you choose to do so, you take the chance of getting a ticket, or causing a traffic accident, or any other bad consequence of your poor choice. Usually the consequences of running a stop sign are no consequences at all, but that does not mean that it's a good idea.
Editing an enumerable while you're enumerating it is a bad practice, but the runtime is not required to be a traffic cop and prevent you from doing so. Nor is it required to flag the operation as illegal with an exception. It may do so, and sometimes it does do so, but there is not a requirement that it does so consistently.
You've found a case where the runtime does not detect the problem and does not throw an exception, but you do get a result that you find unexpected. That's fine. You broke the rules, and this time it just happens that the consequence of breaking the rules was an unexpected outcome. The runtime is not required to make the consequence of breaking the rules into an exception.
If you tried to do the same thing where, say, you called Add
on a List<T>
while enumerating the list, you'd get an exception because someone wrote code in List<T>
that detects that situation.
No one wrote that code for "linq over an array", and so, no exception. The authors of LINQ were not required to write that code; you were required to not write the code you wrote! You chose to write a bad program that violates the rules, and the runtime is not required to catch you every time you write a bad program.
It seems like the LINQ query re-runs every time it's iterated over
That is correct. A query is a question about a data structure. If you change that data structure, the answer to the question can change. Enumerating the query answers the question.
However, that is an entirely different issue than the one in the title of your question. You really have two questions here:
You can do this bad practice because nothing stops you from writing a bad program except your good sense; write better programs that do not do this!
Yes; a query is a question, not an answer. An enumeration of the query is an answer, and the answer can change over time.
Upvotes: 22
Reputation: 4344
IEnumerable
in c# is lazy. This means whenever you force it to evaluate you get the result. In your case Count()
forces the linqLIST
to evaluate every time you call it. by the way, linqLIST
is not a list right now
Upvotes: 2