Reputation: 11730
I just saw this bit of code that has a count++
side-effect in the .GroupBy
predicate. (originally here).
object[,] data; // This contains all the data.
int count = 0;
List<string[]> dataList = data.Cast<string>()
.GroupBy(x => count++ / data.GetLength(1))
.Select(g => g.ToArray())
.ToList();
This terrifies me because I have no idea how many times the implementation will invoke the key selector function. And I also don't know if the function is guaranteed to be applied to each item in order. I realize that, in practice, the implementation may very well just call the function once per item in order, but I never assumed that as being guaranteed, so I'm paranoid about depending on that behaviour -- especially given what may happen on other platforms, other future implementations, or after translation or deferred execution by other LINQ providers.
As it pertains to a side-effect in the predicate, are we offered some kind of written guarantee, in terms of a LINQ specification or something, as to how many times the key selector function will be invoked, and in what order?
Please, before you mark this question as a duplicate, I am looking for a citation of documentation or specification that says one way or the other whether this is undefined behaviour or not.
For what it's worth, I would have written this kind of query the long way, by first performing a select query with a predicate that takes an index, then creating an anonymous object that includes the index and the original data, then grouping by that index, and finally selecting the original data out of the anonymous object. That seems more like a correct way of doing functional programming. And it also seems more like something that could be translated to a server-side query. The side-effect in the predicate just seems wrong to me - and against the principles of both LINQ and functional programming, so I would assume there would be no guarantee specified and that this may very well be undefined behaviour. Is it?
I realize this question may be difficult to answer if the documentation and LINQ specification actually says nothing regarding side effects in predicates. I want to know specifically whether:
Upvotes: 1
Views: 298
Reputation: 3915
According to official C# Language Specification, on page 203, we can read (emphasis mine):
12.17.3.1 The C# language does not specify the execution semantics of query expressions. Rather, query expressions are translated into invocations of methods that adhere to the query-expression pattern (§12.17.4). Specifically, query expressions are translated into invocations of methods named Where, Select, SelectMany, Join, GroupJoin, OrderBy, OrderByDescending, ThenBy, ThenByDescending, GroupBy, and Cast. These methods are expected to have particular signatures and return types, as described in §12.17.4. These methods may be instance methods of the object being queried or extension methods that are external to the object. These methods implement the actual execution of the query.
Upvotes: 1
Reputation: 685
From looking at the source code of GroupBy in corefx on GitHub, it does seems like the key selector function is indeed called once per element, and it is called in the order that the previous IEnumerable provides them. I would in no way consider this a guarantee though.
In my view, any IEnumerables which cannot be enumerated multiple times safely are a big red flag that you may want to reconsider your design choices. An interesting issue that could arise from this is that for example if you view the contents of this IEnumerable in the Visual Studio debugger, it will probably break your code, since it would cause the count
variable to go up.
The reason this code hasn't exploded up until now is probably because the IEnumerable is never stored anywhere, since .ToList
is called right away. Therefore there is no risk of multiple enumerations (again, with the caveat about viewing it in the debugger and so on).
Upvotes: 0