KCIsLearning
KCIsLearning

Reputation: 289

Time series LINQ query

I have a central repository for IoT device logs. So as the logs arrive they have a timestamp. The problem I want to solve is, over a given time span, the same device might send multiple logs regarding its interaction with a specific catalyst. I want to consider that set of logs as a single event and not 5 disparate logs. I want to count the number of interactions. and not the number of logs.

Data Set

public class Data
{
    public Guid DeviceId {get; set;}
    public DateTime StartTime { get; set; }
    public DateTime EndDateTime { get; set; }
    public int Id { get; set; }
    public int Direction { get; set;}
}

Data d1 = new Data();// imagine it's populated
Data d2 = new Data();// imagine it's populated

I am looking for a LINQ query that would yield something along the lines of

If ((d1.DeviceId == d2.DeviceId )  && (d1.Id == d2.Id) && (d1.Direction == d2.Direction) && (d1.StartTime - d2.StartTime < 15 minutes ))  

If i know that the same IoT device is interacting with the same Id (catalyst) and the Direction is the same, and all of those logs occur within a 15 minute time span, It can be presumed that they correspond to the same catalyst event.

I do not control the log creation so ... no i cannot update the data to include "something" that would indicate the relationship.

Data per request... nothing fancy. I am sure most people suspect that I have 30+ properties and I only provide the one impacted by the calculation, but this is a simple set of possibilities

class SampleData
{
    public List<Data> GetSampleData()
    {
        Guid device1 = Guid.NewGuid();

        List<Data> dataList = new List<Data>();

        Data  data1 = new Data();
        data1.DeviceId = device1;
        data1.Id = 555;
        data1.Direction = 1; 
        data1.StartTime = new DateTime(2010, 8, 18, 16, 32, 0);
        data1.EndDateTime = new DateTime(2010, 8, 18, 16, 32, 30);
        dataList.Add(data1);

        //so this data point should be excluded in the final result
        Data data2 = new Data();
        data1.DeviceId = device1;
        data1.Id = 555;
        data1.Direction = 1;
        data1.StartTime = new DateTime(2010, 8, 18, 16, 32, 32);
        data1.EndDateTime = new DateTime(2010, 8, 18, 16, 33, 30);
        dataList.Add(data2);

        //Should be included because ID is different
        Data data3 = new Data();
        data1.DeviceId = device1;
        data1.Id = 600;
        data1.Direction = 1;
        data1.StartTime = new DateTime(2010, 8, 18, 16, 32, 2);
        data1.EndDateTime = new DateTime(2010, 8, 18, 16, 32, 35);
        dataList.Add(data3);

        //exclude due to time
        Data data4 = new Data();
        data1.DeviceId = device1;
        data1.Id = 600;
        data1.Direction = 1;
        data1.StartTime = new DateTime(2010, 8, 18, 16, 32, 37);
        data1.EndDateTime = new DateTime(2010, 8, 18, 16, 33, 40);
        dataList.Add(data4);

        //include because time > 15 minutes 
        Data data5 = new Data();
        data1.DeviceId = device1;
        data1.Id = 600;
        data1.Direction = 1;
        data1.StartTime = new DateTime(2010, 8, 18, 16, 58, 42);
        data1.EndDateTime = new DateTime(2010, 8, 18, 16, 58, 50);
        dataList.Add(data5);

        return dataList;
    } 

Upvotes: 2

Views: 447

Answers (1)

NetMage
NetMage

Reputation: 26907

This turned out to be more complex than I hoped for.

I used a custom LINQ extension method I have called ScanPair which is a variation of my Scan method, which is an version of the APL scan operator (which is like Aggregate, but returns the intermediate results). ScanPair returns the intermediate results of the operation along with each original value. I think I need to think about how to make all of these more general purpose, as the pattern is used by a bunch of other extension methods I have for grouping by various conditions (e.g. sequential, runs, while test is true or false).

public static class IEnumerableExt {
    public static IEnumerable<(TKey Key, T Value)> ScanPair<T, TKey>(this IEnumerable<T> src, Func<T, TKey> seedFn, Func<(TKey Key, T Value), T, TKey> combineFn) {
        using (var srce = src.GetEnumerator()) {
            if (srce.MoveNext()) {
                var seed = (seedFn(srce.Current), srce.Current);

                while (srce.MoveNext()) {
                    yield return seed;
                    seed = (combineFn(seed, srce.Current), srce.Current);
                }
                yield return seed;
            }
        }
    }
}

Now, you can use a tuple as an intermediate result to track the initial timestamp and the group number, and increment to the next (timestamp, group number) when the interval goes over 15 minutes. If you first group by the interaction, and then count the less than 15-minute groups per interaction, you get the answer:

var ans = interactionLogs.GroupBy(il => new { il.DeviceId, il.Id, il.Direction })
            .Select(ilg => new {
                ilg.Key,
                Count = ilg.OrderBy(il => il.Timestamp)
                           .ScanPair(il => (firstTimestamp: il.Timestamp, groupNum: 1), (kvp, cur) => (cur.Timestamp - kvp.Key.firstTimestamp).TotalMinutes <= 15 ? kvp.Key : (cur.Timestamp, kvp.Key.groupNum + 1))
                           .GroupBy(ilkvp => ilkvp.Key.groupNum, ilkvp => ilkvp.Value)
                           .Count()
            });

Here is a portion of a sample of intermediate results from ScanPair - the actual result is a ValueTuple with two fields, where the Key is the intermediate result (which is the ValueTuple of firstTimestamp,groupNum) and Value is the corresponding source (log) item. Using the function seeded version puts the first source item into the seed function to begin the process.

Key_firstTimestamp  Key_groupNum    Timestamp
7:58 PM                 1           7:58 PM
7:58 PM                 1           8:08 PM
7:58 PM                 1           8:12 PM
8:15 PM                 2           8:15 PM
8:15 PM                 2           8:20 PM

Upvotes: 2

Related Questions