barlop
barlop

Reputation: 13820

c# How does Enumerable.SelectMany with two function arguments, produce a combined enumerable?

c# How does Enumerable.SelectMany with two function arguments, produce a combined enumerable, based on the msdn description?

This is not a duplicate of Difference Between Select and SelectMany On that question and answers, many use the SelectMany with just one function parameter, though I understand that one. And of the two answers that use SelectMany with 2 functions, one just states it as a given that it combines two enumerables, it doesn't explain how it does it. The msdn documentation doesn't describe it as combining two collections. It describes a more mechanical process and I can't see how what it describes gets that result, though I don't really understand what msdn is describing for the functionality of this particular overload of SelectMany.

I understand the case of SelectMany with one function. It goes through each element of the enumerable, replaces/transforms it, based on the function given, and then flattens it.

So I understand this

string[] str1 = { "abc", "def", "ghi" };
var res = str1.SelectMany(x => "z" + x + "z");
foreach (char x in res) Console.Write(x);
 // prints zabczzdefzzghiz

{"abc","def","ghi"}
becomes
{"zabcz","zdefz","zghiz"}
(that's after the transform function)

And then it gets flattened. Each string is flattened into an array of char, and they are combined.
{'z','a','b','c','z','z','d','e','f','z','g','h','i','z'}

Where I struggle though is when "Enumerable.SelectMany" has a second function.

msdn page listing overloads for SelectMany

https://msdn.microsoft.com/en-us/library/system.linq.enumerable.selectmany(v=vs.100).aspx

msdn page for the SelectMany with 2 functions

https://msdn.microsoft.com/en-us/library/bb534631(v=vs.100).aspx

So the first function is called a collectionSelector, and the second function, a resultSelector

Reading this description "Projects each element of a sequence to an IEnumerable, flattens the resulting sequences into one sequence, and invokes a result selector function on each element therein."

It sounds from the description like it invokes function1, then flattens the result, then invokes function2.

So now looking at such an example

string[] str1 = { "abc", "def", "ghi" };
string[] str2 = { "qrs", "tuv", "wxy" };

var res = str1.SelectMany(x => str2, (s1, s2) => s1 + s2);

foreach (string x in res) Console.Write(x+" ");

//prints abcqrs abctuv abcwxy defqrs deftuv defwxy ghiqrs ghituv ghiwxy

{ "abc", "def", "ghi" }

becomes after f1

{ { "qrs", "tuv", "wxy" },{ "qrs", "tuv", "wxy" },{ "qrs", "tuv", "wxy" } }

now if I try to follow what I understand of the msdn description, it says this is flattened. But when I flatten it I then can't see any way this can work

{ "qrs", "tuv", "wxy" , "qrs", "tuv", "wxy" , "qrs", "tuv", "wxy" }

If on the other hand I don't flatten it so I still have

{ { "qrs", "tuv", "wxy" },{ "qrs", "tuv", "wxy" },{ "qrs", "tuv", "wxy" } }

Then I could imagine that perhaps perhaps s1 is an element of the original, s2 is an element of that intermediate IEnumerable. And it goes into each element of the intermediate, for the first element of the intermediate, it applies s1="abc". For the second element of the intermediate it applies s1="def", and for the third it applies s1="ghi". And for each string of the intermediate it does s1+s2 And so you get

{ { "abcqrs", "abctuv", "abcwxy" },{ "defqrs", "deftuv", "defwxy" },{ "ghiqrs", "ghituv", "ghiwxy" } }

And then I could imagine that it flattens it

{ "abcqrs", "abctuv", "abcwxy" , "defqrs", "deftuv", "defwxy" , "ghiqrs", "ghituv", "ghiwxy" }

And that works as an explanation but it's not anything like what msdn describes.

Msdn describes the intermediate result flattening straight after the first function. And that baffles me. I do not understand how going by the msdn description, we get that result. Though I don't really understand the msdn description.

Many take as a given that it combines two IEnumerables, but msdn doesn't say it does that, it describes a process. And i'd like to understand how that process leads to two IEnumerables getting combined the way it does it.

Upvotes: 3

Views: 1558

Answers (1)

Kirin Yao
Kirin Yao

Reputation: 1636

Notice that there is no first f1 and then f2. The whole process is something like a pipeline. It means:

  1. Goes to the first element("abc") of the source collection.
  2. Call f1 on it and got the first intermediate collection({ "qrs", "tuv", "wxy" }).
  3. Goes to the first element of the intermediate collection(qrs), call f2 on it.
  4. Yield the first result, abcqrs.
  5. Goes to step 3, until there is no element in the first intermediate collection.
  6. ...

So there is not a intermediate collection with elements { { "qrs", "tuv", "wxy" },{ "qrs", "tuv", "wxy" },{ "qrs", "tuv", "wxy" } }. The result is yield one by one.

The MSDN description DOES confuse people. It sounds like there is a intermediate step of the method. But actually, the flatten behavior is done when you iterate the result.

added by barlop

the .net implementation code peter mentioned is really helpful in this case. (note, i've removed some unnecessary braces from it for tidyness). It confirms what is written above.

   static IEnumerable<TResult> SelectManyIterator<TSource, TCollection, TResult>(IEnumerable<TSource> source, Func<TSource, IEnumerable<TCollection>> collectionSelector, Func<TSource, TCollection, TResult> resultSelector) 
    {
        foreach (TSource element in source) 
            foreach (TCollection subElement in collectionSelector(element))                             
                yield return resultSelector(element, subElement);
    }

Upvotes: 3

Related Questions