Reputation: 7611
I have the following LINQ query, which will be used to find any consignments that are 'similar':
from c in cons
group c by new { c.TripDate.Value, c.DeliveryPostcode, c.DeliveryName } into cg
let min = cg.Min(a => a.DeliverFrom)
let max = cg.Max(a => a.DeliverFrom)
let span = max - min
where span.TotalMinutes <= 59
select cg;
The main thing is the min, max and span. Basically, any consignments that are in the 'group', that have a DeliverFrom datetime within 59 minutes of any other one in the group, will be returned in the group.
The code above looked good originally to me, but upon further inspection it seems that if there's more than 2 records in the group - 2 with DeliverFrom dates 59 minutes of each other, and one with a DeliverFrom date not within 59 minutes of any, then the query would not return that group, as it'll be selecting the min and the max and seeing that the difference is more than 59 minutes. What I want to happen is to see that there are 2 consignments in the group with DeliverFrom dates close enough, and just select a group containing them two.
How would I go about doing this?
EDIT: Doh, another clause has been added in this. There's a field called 'Weight' and one called 'Spaces', each group can have a max of 26 Weight and 26 Spaces
Upvotes: 3
Views: 191
Reputation: 437404
I 'd group the consignments with the same logic like you do but use this overload of GroupBy
instead, allowing me to project each group of consigments into another type. This type would here be an enumerable sequence of groups of consigments, each element in which represents consignments that non only were in the same group to begin with, but also should all be delivered within the duration of an hour. So the signature of resultSelector
would be
Func<anontype, IEnumerable<Consignment>, IEnumerable<IEnumerable<Consignment>>>
At this point it becomes clear that it would probably be a good idea to define a type for the grouping so that you can get rid of the anonymous type in the above signature; otherwise you 'd be forced to define your resultSelector
as a lambda.
Within resultSelector
, you need to first of all sort the incoming group of consignments by DeliverFrom
and then return sub-groups based on that time. So it might look like this:
IEnumerable<IEnumerable<Consignment>>
Partitioner(ConsignmentGroupKey key, IEnumerable<Consignment> cg)
{
cg = cg.OrderBy(c => c.DeliverFrom);
var startTime = cg.First().DeliverFrom;
var subgroup = new List<Consignment>();
foreach(var cons in cg) {
if ((cons.DeliverFrom - startTime).TotalMinutes < 60) {
subgroup.Add(cons);
}
else {
yield return subgroup;
startTime = cons.DeliverFrom;
subgroup = new List<Consignment>() { cons };
}
}
if (subgroup.Count > 0) {
yield return subgroup;
}
}
I haven't tried this, but as far as I can tell it should work.
Upvotes: 1
Reputation: 51339
If I'm not mistaken, what you are looking for is a statistical problem called cluster identification, and if so it is a far more complex problem than you might think.
As a thought exercise, imagine if you had 3 entries, at 1:00, 1:30, and 2:00. How would you want to group these? Either the first two or the last two would work as a group (less than 59 minutes apart), but all 3 would not.
If you just want to keep chaining items together into a group as long as they are within 59 minutes of any other item in the group, you'd need to keep iterating until you stop finding new items to add to any cluster.
Upvotes: 3