Reputation: 289
I have a central repository for IoT device logs. So as the logs arrive they have a timestamp. The problem I want to solve is, over a given time span, the same device might send multiple logs regarding its interaction with a specific catalyst. I want to consider that set of logs as a single event and not 5 disparate logs. I want to count the number of interactions. and not the number of logs.
Data Set
public class Data
{
public Guid DeviceId {get; set;}
public DateTime StartTime { get; set; }
public DateTime EndDateTime { get; set; }
public int Id { get; set; }
public int Direction { get; set;}
}
Data d1 = new Data();// imagine it's populated
Data d2 = new Data();// imagine it's populated
I am looking for a LINQ query that would yield something along the lines of
If ((d1.DeviceId == d2.DeviceId ) && (d1.Id == d2.Id) && (d1.Direction == d2.Direction) && (d1.StartTime - d2.StartTime < 15 minutes ))
If i know that the same IoT device is interacting with the same Id (catalyst) and the Direction is the same, and all of those logs occur within a 15 minute time span, It can be presumed that they correspond to the same catalyst event.
I do not control the log creation so ... no i cannot update the data to include "something" that would indicate the relationship.
Data per request... nothing fancy. I am sure most people suspect that I have 30+ properties and I only provide the one impacted by the calculation, but this is a simple set of possibilities
class SampleData
{
public List<Data> GetSampleData()
{
Guid device1 = Guid.NewGuid();
List<Data> dataList = new List<Data>();
Data data1 = new Data();
data1.DeviceId = device1;
data1.Id = 555;
data1.Direction = 1;
data1.StartTime = new DateTime(2010, 8, 18, 16, 32, 0);
data1.EndDateTime = new DateTime(2010, 8, 18, 16, 32, 30);
dataList.Add(data1);
//so this data point should be excluded in the final result
Data data2 = new Data();
data1.DeviceId = device1;
data1.Id = 555;
data1.Direction = 1;
data1.StartTime = new DateTime(2010, 8, 18, 16, 32, 32);
data1.EndDateTime = new DateTime(2010, 8, 18, 16, 33, 30);
dataList.Add(data2);
//Should be included because ID is different
Data data3 = new Data();
data1.DeviceId = device1;
data1.Id = 600;
data1.Direction = 1;
data1.StartTime = new DateTime(2010, 8, 18, 16, 32, 2);
data1.EndDateTime = new DateTime(2010, 8, 18, 16, 32, 35);
dataList.Add(data3);
//exclude due to time
Data data4 = new Data();
data1.DeviceId = device1;
data1.Id = 600;
data1.Direction = 1;
data1.StartTime = new DateTime(2010, 8, 18, 16, 32, 37);
data1.EndDateTime = new DateTime(2010, 8, 18, 16, 33, 40);
dataList.Add(data4);
//include because time > 15 minutes
Data data5 = new Data();
data1.DeviceId = device1;
data1.Id = 600;
data1.Direction = 1;
data1.StartTime = new DateTime(2010, 8, 18, 16, 58, 42);
data1.EndDateTime = new DateTime(2010, 8, 18, 16, 58, 50);
dataList.Add(data5);
return dataList;
}
Upvotes: 2
Views: 447
Reputation: 26907
This turned out to be more complex than I hoped for.
I used a custom LINQ extension method I have called ScanPair
which is a variation of my Scan
method, which is an version of the APL scan operator (which is like Aggregate
, but returns the intermediate results). ScanPair
returns the intermediate results of the operation along with each original value. I think I need to think about how to make all of these more general purpose, as the pattern is used by a bunch of other extension methods I have for grouping by various conditions (e.g. sequential, runs, while test is true or false).
public static class IEnumerableExt {
public static IEnumerable<(TKey Key, T Value)> ScanPair<T, TKey>(this IEnumerable<T> src, Func<T, TKey> seedFn, Func<(TKey Key, T Value), T, TKey> combineFn) {
using (var srce = src.GetEnumerator()) {
if (srce.MoveNext()) {
var seed = (seedFn(srce.Current), srce.Current);
while (srce.MoveNext()) {
yield return seed;
seed = (combineFn(seed, srce.Current), srce.Current);
}
yield return seed;
}
}
}
}
Now, you can use a tuple as an intermediate result to track the initial timestamp and the group number, and increment to the next (timestamp, group number) when the interval goes over 15 minutes. If you first group by the interaction, and then count the less than 15-minute groups per interaction, you get the answer:
var ans = interactionLogs.GroupBy(il => new { il.DeviceId, il.Id, il.Direction })
.Select(ilg => new {
ilg.Key,
Count = ilg.OrderBy(il => il.Timestamp)
.ScanPair(il => (firstTimestamp: il.Timestamp, groupNum: 1), (kvp, cur) => (cur.Timestamp - kvp.Key.firstTimestamp).TotalMinutes <= 15 ? kvp.Key : (cur.Timestamp, kvp.Key.groupNum + 1))
.GroupBy(ilkvp => ilkvp.Key.groupNum, ilkvp => ilkvp.Value)
.Count()
});
Here is a portion of a sample of intermediate results from ScanPair
- the actual result is a ValueTuple
with two fields, where the Key
is the intermediate result (which is the ValueTuple
of firstTimestamp
,groupNum
) and Value
is the corresponding source (log) item. Using the function seeded version puts the first source item into the seed function to begin the process.
Key_firstTimestamp Key_groupNum Timestamp
7:58 PM 1 7:58 PM
7:58 PM 1 8:08 PM
7:58 PM 1 8:12 PM
8:15 PM 2 8:15 PM
8:15 PM 2 8:20 PM
Upvotes: 2