tpascale
tpascale

Reputation: 2576

Linq Outer Join on object array

Consider a set of 6 [StringKey,Value] arrays, in pseudo code:

object[,2][6] KeyValueArrays; //illustr as array, could also be List<> or Dict<>

I would like to convert this into a single table:

object[,7] KeyWithUpTo6Values  //outer join of the 6 orig key-value sets

If a given key occurs in (say) 3 of the 6 original [Key, Value] arrays, then the joined row for that key will contain 3 values and 3 nulls.

Question: can I do this with LINQ using just simple containers like arrays and generic lists & dictionaries ?

Upvotes: 2

Views: 2165

Answers (2)

Bruno Brant
Bruno Brant

Reputation: 8554

I think I might be missing something, but the fact that your question mentions generic lists & dictionaries, I think that when you say array, you are referring to some king of vector data structure.

So, key-value pairs can be stored using a Dictionary<T1,T2>. Let's, for instance, assume that you key is a string and a value class MyValueClass with a single property of integer type. You data declaration would look like this:

class Program
{
    class MyValueClass
    {
        public int Value {get;set;}
    }

    // Other elements elided for clarity

    private Dictionary<string, MyValueClass> data = new Dictionary<string, MyValueClass>();

}

Now, you stated that you'd have an N number of these structures which you'd like to do an outer join on. For instance,

private Dictionary<string, MyValueClass>[] data = new Dictionary<string, MyValueClass>[6]();

The problem here is that the number of "columns" in the type of the joined structure depends on this N, but unless you use some other kind of data structure (i.e., a List) to represent your row, you will not be able to do this dynamically, i.e., for any N, because data in C# is declared statically.

To illustrate, check the query below, where I assume that the array has a dimension of 4:

var query = from d0 in _data[0]
            join d1 in _data[1] on d0.Key equals d1.Key into d1joined
            from d1 in d1joined.DefaultIfEmpty()
            join d2 in _data[2] on d1.Key equals d2.Key into d2joined
            from d2 in d2joined.DefaultIfEmpty()
            join d3 in _data[3] on d2.Key equals d3.Key into d3joined
            from d3 in d3joined.DefaultIfEmpty()
            select new
                     {
                         d0.Key,
                         D0 = d0.Value,
                         D1 = d1.Value,
                         D2 = d2.Value,
                         D3 = d3.Value,
                      };

Don't focus on the joins, I'll explain then later, but check the select new operator. See that when Linq assembles this anonymous type, it must know the exact number of property -- our columns -- because it's part of the syntax.

So if you want to, you can write a query to do what you ask, but it'll work only for a know value of N. In case that happens to be a sufficient solution, it's actually simple although the example I wrote might be a little over complex. Back to the query above, you'll see a repeated pattern of from / join / from DefaultIfEmpty. This pattern was copied from here, and it works actually simply: it joins two structures by some key into another resulting structure (the into dnjoined above). Linq will process all records on the left list, and for each of those, it will process every record of the right list (a Cartesian plan of N1 x N2), like this:

foreach (var d0 in _data[0])
{
    foreach (var d1 in _data[1])
    {
        if (d0.Key == d1.Key) 
        {
            // Produces an anonymous structure of { d0.Key, d0.Value, d1.Value }
            // and returns it.
        }
    }
}

So, a inner join operation is the same as combining every row and then selecting the ones where the keys match. The outer join differs by producing a row even when the key doesn't match, so in our pseudo-code, it would be something like:

foreach (var d0 in _data[0])
{
    foreach (var d1 in _data[1])
    {
        if (d0.Key == d1.Key) 
        {
            // Produces an anonymous structure of { d0.Key, d0.Value, d1.Value }
            // and returns it.
        }
        else
        {
            // Produce a anonymous structure of {d0.Key, d0.Value, null}
        }     
    }
}

The else block is achieved in the LINQ code before by adding a second where clause, which asks for rows even when there is no match, which is a empty list that can return data when DefaultIfEmpty is called. (again, see the link above to get more info)

I'll copy below a full example that uses the data structure and the linq query that I mention above. Hopefully, it's self explanatory:

using System;
using System.Collections.Generic;
using System.Linq;

namespace TestZone
{
    class Example
    {
        #region Types
        class MyValue
        {
            public int Value { get; set; }

            public override string ToString()
            {
                return string.Format("MyValue(Value = {0})", Value);
            }
        }
        #endregion // Types

        #region Constants
        /// <summary>
        /// Our N
        /// </summary>
        private const int NumberOfArrays = 4;

        /// <summary>
        /// How many rows per dictionary
        /// </summary>
        private const int NumberOfRows = 10; 
        #endregion // Constants

        #region Fields
        private Dictionary<string, MyValue>[] _data = new Dictionary<string, MyValue>[NumberOfArrays]; 
        #endregion // Fields

        #region Constructor
        public Example()
        {
            for (var index = 0; index < _data.Length; index++)
            {
                _data[index] = new Dictionary<string, MyValue>(NumberOfRows);
            }
        } 
        #endregion // Constructor

        public void GenerateRandomData()
        {
            var rand = new Random(DateTime.Now.Millisecond);

            foreach (var dict in _data)
            {
                // Add a number of rows
                for (var i = 0; i < NumberOfRows; i++)
                {
                    var integer = rand.Next(10);    // We use a value of 10 so we have many collisions.
                    dict["ValueOf" + integer] = new MyValue { Value = integer };
                }
            }
        }

        public void OuterJoin()
        {
            // To get the outer join, we have to know the expected N before hand, as this example will show.
            // Do multiple joins
            var query = from d0 in _data[0]
                        join d1 in _data[1] on d0.Key equals d1.Key into d1joined
                        from d1 in d1joined.DefaultIfEmpty()
                        join d2 in _data[2] on d1.Key equals d2.Key into d2joined
                        from d2 in d2joined.DefaultIfEmpty()
                        join d3 in _data[3] on d2.Key equals d3.Key into d3joined
                        from d3 in d3joined.DefaultIfEmpty()
                        select new
                                   {
                                       d0.Key,
                                       D0 = d0.Value,
                                       D1 = d1.Value,
                                       D2 = d2.Value,
                                       D3 = d3.Value,
                                   };

            foreach (var q in query)
            {
                Console.WriteLine(q);
            }
        }
    }

    class Program
    {

        public static void Main()
        {
            var m = new Example();
            m.GenerateRandomData();
            m.OuterJoin();

        }
    }
}

Upvotes: 2

code4life
code4life

Reputation: 15794

Multi-dimensional arrays do not implement IEnumerable<T> so you won't be able to use LINQ. Jagged arrays on the other hand, could be manipulated by LINQ.

Upvotes: 0

Related Questions