xb4rrm27m5
xb4rrm27m5

Reputation: 123

How to use IEnumerable.GroupBy comparing multiple properties between elements?

How do I group "adjacent" Sites:

Given data:

List<Site> sites = new List<Site> {
    new Site { RouteId="A", StartMilepost=0.00m, EndMilepost=1.00m },
    new Site { RouteId="A", StartMilepost=1.00m, EndMilepost=2.00m },
    new Site { RouteId="A", StartMilepost=5.00m, EndMilepost=7.00m },
    new Site { RouteId="B", StartMilepost=3.00m, EndMilepost=5.00m },
    new Site { RouteId="B", StartMilepost=11.00m, EndMilepost=13.00m },
    new Site { RouteId="B", StartMilepost=13.00m, EndMilepost=14.00m },
};

I want result:

[
    [
        Site { RouteId="A", StartMilepost=0.00m, EndMilepost=1.00m },
        Site { RouteId="A", StartMilepost=1.00m, EndMilepost=2.00m }
    ],
    [
        Site { RouteId="A", StartMilepost=5.00m, EndMilepost=7.00m }
    ],
    [
        Site { RouteId="B", StartMilepost=3.00m, EndMilepost=5.00m }
    ],
    [
        Site { RouteId="B", StartMilepost=11.00m, EndMilepost=13.00m },
        Site { RouteId="B", StartMilepost=13.00m, EndMilepost=14.00m }
    ]
]

I tried using GroupBy with a custom comparer function checking routeIds match and first site's end milepost is equal to the next sites start milepost. My HashKey function just checks out routeId so all sites within a route will get binned together but I think the comparer makes an assumption like if A = B, and B = C, then A = C, so C won't get grouped with A,B,C since in my adjacency case, A will not equal C.

Upvotes: 6

Views: 438

Answers (4)

Sinatr
Sinatr

Reputation: 21999

I was surprised that GroupBy doesn't have overload with Func<..., bool> for in-place grouping without hassle of implementing custom class.

So I've created one:

public static IEnumerable<IEnumerable<T>> GroupBy<T>(this IEnumerable<T> source, Func<T, T, bool> func)
{
    var items = new List<T>();
    foreach (var item in source)
    {
        if (items.Count != 0)
            if (!func(items[0], item))
            {
                yield return items;
                items = new List<T>();
            }
        items.Add(item);
    }
    if (items.Count != 0)
        yield return items;
}

The usage:

var result = sites.GroupBy((x, y) => x.RouteId == y.RouteId &&
    x.StartMilepost <= y.EndMilepost && x.EndMilepost >= y.StartMilepost).ToList();

This should produce wanted result.

Few words about implementation. In above extension method you have to supply delegate which should return true if x and y should be grouped. The method is dumb and will simply compare adjacent items in same order as they come. Your input is ordered, but you may have to use OrderBy/ThenBy before using it with something else.

Upvotes: 1

Theodor Zoulias
Theodor Zoulias

Reputation: 43515

Here is an extension method for grouping lists of the specific class (Site). It is implemented with an inner iterator function GetGroup that produces one group with adjacent sites. This function is called in a while loop to produce all groups.

public static IEnumerable<IEnumerable<Site>> GroupAdjacent(
    this IEnumerable<Site> source)
{
    var ordered = source
        .OrderBy(item => item.RouteId)
        .ThenBy(item => item.StartMilepost);
    IEnumerator<Site> enumerator;
    bool finished = false;
    Site current = null;
    using (enumerator = ordered.GetEnumerator())
    {
        while (!finished)
        {
            yield return GetGroup();
        }
    }

    IEnumerable<Site> GetGroup()
    {
        if (current != null) yield return current;
        while (enumerator.MoveNext())
        {
            var previous = current;
            current = enumerator.Current;
            if (previous != null)
            {
                if (current.RouteId != previous.RouteId) yield break;
                if (current.StartMilepost != previous.EndMilepost) yield break;
            }
            yield return current;
        }
        finished = true;
    }
}

Usage example:

var allGroups = sites.GroupAdjacent();
foreach (var group in allGroups)
{
    foreach (var item in group)
    {
        Console.WriteLine(item);
    }
    Console.WriteLine();
}

Output:

A 0,00..1,00
A 1,00..2,00

A 5,00..7,00

B 3,00..5,00

B 11,00..13,00
B 13,00..14,00

Upvotes: 1

codeMonkey
codeMonkey

Reputation: 4815

Here are a couple of implementations where order of Site's does not matter. You can use the LINQ Aggregate function:

return sites.GroupBy(x => x.RouteId)
            .SelectMany(x =>
            {
                var groupedSites = new List<List<Site>>();
                var aggs = x.Aggregate(new List<Site>(), (contiguous, next) =>
                {
                    if (contiguous.Count == 0 || contiguous.Any(y => y.EndMilepost == next.StartMilepost))
                    {
                        contiguous.Add(next);
                    }
                    else if (groupedSites.Any(y => y.Any(z => z.EndMilepost == next.StartMilepost)))
                    {
                        var groupMatchIndex = groupedSites.FindIndex(y => y.Any(z => z.EndMilepost == next.StartMilepost));
                        var el = groupedSites.ElementAt(groupMatchIndex);
                        el.Add(next);
                        groupedSites[groupMatchIndex] = el;
                    }
                    else
                    {
                        groupedSites.Add(contiguous);
                        contiguous = new List<Site>();
                        contiguous.Add(next);
                    }
                    return contiguous;
                }, final => { groupedSites.Add(final); return final; });
                return groupedSites;
            });

Alternatively, just with foreach:

return sites.GroupBy(x => x.RouteId)
            .SelectMany(x =>
            {
                var groupedSites = new List<List<Site>>();
                var aggList = new List<Site>();
                foreach (var item in x)
                {
                    if (aggList.Count == 0 || aggList.Any(y => y.EndMilepost == item.StartMilepost))
                    {
                        aggList.Add(item);
                        continue;
                    }

                    var groupMatchIndex = groupedSites.FindIndex(y => y.Any(z => z.EndMilepost == item.StartMilepost));
                    if (groupMatchIndex > -1)
                    {
                        var el = groupedSites.ElementAt(groupMatchIndex);
                        el.Add(item);
                        groupedSites[groupMatchIndex] = el;
                        continue;
                    }

                    groupedSites.Add(aggList);
                    aggList = new List<Site>();
                    aggList.Add(item);
                }

                groupedSites.Add(aggList);
                return groupedSites;
            });

Upvotes: 2

Dmitrii Bychenko
Dmitrii Bychenko

Reputation: 186688

First, let Site class be (for debugging / demonstration)

public class Site {
  public Site() { }

  public string RouteId;
  public Decimal StartMilepost;
  public Decimal EndMilepost;

  public override string ToString() => $"{RouteId} {StartMilepost}..{EndMilepost}";
}

Well, as you can see we have to break the rules: equality must be transitive, i.e. whenever

A equals B
B equals C

then

A equals C

It's not the case in your example. However, if we sort the sites by StartMilepost we, technically, can implement IEqualityComparer<Site> like this:

public class MySiteEqualityComparer : IEqualityComparer<Site> {
  public bool Equals(Site x, Site y) {
    if (ReferenceEquals(x, y))
      return true;
    else if (null == x || null == y)
      return false;
    else if (x.RouteId != y.RouteId)
      return false;
    else if (x.StartMilepost <= y.StartMilepost && x.EndMilepost >= y.StartMilepost)
      return true;
    else if (y.StartMilepost <= x.StartMilepost && y.EndMilepost >= x.StartMilepost)
      return true;

    return false;
  }

  public int GetHashCode(Site obj) {
    return obj == null
      ? 0
      : obj.RouteId == null
         ? 0
         : obj.RouteId.GetHashCode();
  }
}

then GroupBy as usual; please, note that OrderBy is required, since order of comparision matters here. Suppose we have

A = {RouteId="X", StartMilepost=0.00m, EndMilepost=1.00m}
B = {RouteId="X", StartMilepost=1.00m, EndMilepost=2.00m}
C = {RouteId="X", StartMilepost=2.00m, EndMilepost=3.00m}

Here A == B, B == C (so in case of A, B, C all items will be in the same group) but A != C (and thus in A, C, B will end up with 3 groups)

Code:

 List<Site> sites = new List<Site> {
    new Site { RouteId="A", StartMilepost=0.00m, EndMilepost=1.00m },
    new Site { RouteId="A", StartMilepost=1.00m, EndMilepost=2.00m },
    new Site { RouteId="A", StartMilepost=5.00m, EndMilepost=7.00m },
    new Site { RouteId="B", StartMilepost=3.00m, EndMilepost=5.00m },
    new Site { RouteId="B", StartMilepost=11.00m, EndMilepost=13.00m },
    new Site { RouteId="B", StartMilepost=13.00m, EndMilepost=14.00m },
  };

  var result = sites
    .GroupBy(item => item.RouteId)
    .Select(group => group
        // Required Here, since MySiteEqualityComparer breaks the rules
       .OrderBy(item => item.StartMilepost)  
       .GroupBy(item => item, new MySiteEqualityComparer())
       .ToArray())
    .ToArray();

  // Let's have a look
  var report = string.Join(Environment.NewLine, result
    .Select(group => string.Join(Environment.NewLine, 
                                 group.Select(g => string.Join("; ", g)))));

  Console.Write(report);

Outcome:

A 0.00..1.00; A 1.00..2.00
A 5.00..7.00
B 3.00..5.00
B 11.00..13.00; B 13.00..14.00

Upvotes: 3

Related Questions