How to group array of char/string with UNION?

Question

I have a two dimensional array of char, called Letters[ ][ ]

Letters[0][0] = A
       [0][1] = B

Letters[1][0] = C
       [1][1] = D

Letters[2][0] = B
       [2][1] = A
       [2][2] = F

Letters[3][0] = I
       [3][1] = F
       [3][2] = J

I need to group it, so it will be something like this:

group[0] [0] = A
group[0] [1] = B
group[0] [2] = F
group[0] [3] = I
group[0] [4] = J

group[1] [0] = C
group[1] [1] = D

My logic so far for my problem is check every elements with other elements. If both elements are the same letter, it groups together with the whole other array elements with no double/duplicated elements. But, I'm not sure of using C# Linq Union or maybe just a standard array access.

How do I supposed to do to group it in best way? Or are there any other solutions for this?

Peter Duniho · Accepted Answer

I think a pure LINQ solution would be overly complex. This isn't (if I understand your specification correctly) a simple union operation. You want to union based on non-empty intersections. That would mean having to first rearrange the data so LINQ can do a join, to find the data that matches, and since LINQ will only join on equality, doing that while preserving the original grouping information is going to result in syntax that would be more trouble than it's worth, IMHO.

Here is a non-LINQ approach that works for the example you've given:

static void Main(string[] args)
{
    char[][] letters =
    {
        new [] { 'A', 'B' },
        new [] { 'C', 'D' },
        new [] { 'B', 'A', 'F' },
        new [] { 'I', 'F', 'J' },
    };

    List> sets = new List>();

    foreach (char[] row in letters)
    {
        List setIndexes = Enumerable.Range(0, sets.Count)
        .Where(i => row.Any(ch => sets[i].Contains(ch))).ToList();

        CoalesceSets(sets, row, setIndexes);
    }

    foreach (HashSet set in sets)
    {
        Console.WriteLine("{ " + string.Join(", ", set) + " }");
    }
}

private static void CoalesceSets(List> sets, char[] row, List setIndexes)
{
    if (setIndexes.Count == 0)
    {
        sets.Add(new HashSet(row));
    }
    else
    {
        HashSet targetSet = sets[setIndexes[0]];

        targetSet.UnionWith(row);

        for (int i = setIndexes.Count - 1; i >= 1; i--)
        {
            targetSet.UnionWith(sets[setIndexes[i]]);
            sets.RemoveAt(setIndexes[i]);
        }
    }
}

It builds up sets of the input data by scanning the previously identified sets to find which ones the current row of data intersects with, and then coalesces these sets into a single set containing all of the members (your specification appears to impose transitive membership…i.e. if one letter joins sets A and B, and a different letter joins set B and C, you want A, B, and C all joined into a single set).

This isn't an optimal solution, but it's readable. You could avoid the O(N^2) search by maintaining a Dictionary to map each character to the set which contains it. Then instead of scanning all the sets, it's a simple lookup for each character in the current row, to build up the list of set indexes. But there's a lot more "housekeeping" code going that approach; I would not bother implementing it that way unless you find a proven performance issue doing it the more basic way.

By the way: I have a vague recollection I've seen this type of question before on Stack Overflow, i.e. this sort of transitive unioning of sets. I looked for the question but couldn't find it. You may have more luck, and may find there is additional helpful information with that question and its answers.

How to group array of char/string with UNION?

Answers (1)

Related Questions