Rodrigo Fuentes
Rodrigo Fuentes

Reputation: 35

SQL - Group Table by entries X amount of time from each other

I need to group together the entries in which the timestamp difference between one and the other is X amount of seconds or less than then average the value for each of them for each of the devices. In the following example I have a Table with this data and I need to group by device with entries between 60 seconds from each other.

          Device            Timestamp  Value
0  30:8c:fb:a4:b9:8b  10/26/2015 22:50:15     34
1  30:8c:fb:a4:b9:8b  10/26/2015 22:50:46     34
2  c0:ee:fb:35:ec:cd  10/26/2015 22:50:50     33
3  c0:ee:fb:35:ec:cd  10/26/2015 22:50:51     32
4  30:8c:fb:a4:b9:8b  10/26/2015 22:51:15     34
5  30:8c:fb:a4:b9:8b  10/26/2015 22:51:47     32
6  c0:ee:fb:35:ec:cd  10/26/2015 22:52:38     38
7  30:8c:fb:a4:b9:8b  10/26/2015 22:54:46     34

This should be the resulting Table

          Device           First_seen            Last_seen Average_value
0  30:8c:fb:a4:b9:8b  10/26/2015 22:50:15  10/26/2015 22:51:47          33,5
1  c0:ee:fb:35:ec:cd  10/26/2015 22:50:50  10/26/2015 22:50:51          32,5
2  c0:ee:fb:35:ec:cd  10/26/2015 22:52:38  10/26/2015 22:52:38            38
3  30:8c:fb:a4:b9:8b  10/26/2015 22:54:46  10/26/2015 22:54:46            34

Thank you very much for your help.

Upvotes: 2

Views: 53

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173106

There is an old trick for this!
Mostly based on power of Window functions
Perfectly works for BigQuery!

So, first you "mark" all entries which exceed 60 seconds after previous entry!
Those which exceed getting value 1 and rest getting value 0!

Secondly you define groups by summing all previous marks (of course steps above are done while partitioning by device)

And finally, you just do simple grouping by above defined groups

Three simple steps implemented in one query with few simple sub-selects!
Hope this helps

SELECT device, MIN(ts) AS first_seen, MAX(ts) AS last_seen, AVG(value) AS average_value
FROM (
  SELECT device, ts, value, SUM(grp_start) OVER (PARTITION BY device ORDER BY ts) AS grp
  FROM (
   SELECT device, ts, value, 
   IF(TIMESTAMP_TO_SEC(TIMESTAMP(ts))-TIMESTAMP_TO_SEC(TIMESTAMP(ts0))>60,1,0) AS grp_start
   FROM (
      SELECT device, ts, value, LAG(ts, 1) OVER(PARTITION BY device ORDER BY ts) AS ts0
      FROM yourTable
    )
  )
)
GROUP BY device, grp

Upvotes: 1

Strawberry
Strawberry

Reputation: 33945

Here's one way...

DROP TABLE IF EXISTS my_table;

CREATE TABLE my_table
(device CHAR(1) NOT NULL
,timestamp DATETIME NOT NULL
,value INT NOT NULL
,PRIMARY KEY(device,timestamp)
);

INSERT INTO my_table VALUES
('a','2015/10/26 22:50:15',34),
('a','2015/10/26 22:50:46',34),
('b','2015/10/26 22:50:50',33),
('b','2015/10/26 22:50:51',32),
('a','2015/10/26 22:51:15',34),
('a','2015/10/26 22:51:47',32),
('b','2015/10/26 22:52:38',38),
('a','2015/10/26 22:54:46',34);



SELECT m.*
     , AVG(n.value) avg
  FROM 
     ( SELECT a.device
            , a.timestamp start
            , MIN(c.timestamp) end 
         FROM 
            ( SELECT x.*
                   , CASE WHEN x.device = @prev THEN @i:=@i+1 ELSE @i:=1 END i
                   , @prev:=device 
                FROM my_table x
                   , (SELECT @i:=1,@prev:=null) vars 
               ORDER 
                  BY device
                   , timestamp
            ) a
         LEFT 
         JOIN 
            ( SELECT x.*
                   , CASE WHEN x.device = @prev THEN @i:=@i+1 ELSE @i:=1 END i
                   , @prev:=device 
                FROM my_table x
                   , (SELECT @i:=1,@prev:=null) vars 
               ORDER 
                  BY device
                   , timestamp
            ) b
           ON b.device = a.device
          AND b.timestamp > a.timestamp - INTERVAL 60 SECOND
          AND b.i = a.i - 1
         LEFT 
         JOIN 
            ( SELECT x.*
                   , CASE WHEN x.device = @prev THEN @i:=@i+1 ELSE @i:=1 END i
                   , @prev:=device 
                FROM my_table x
                   , (SELECT @i:=1,@prev:=null) vars 
               ORDER 
                  BY device
                   , timestamp
            ) c
           ON c.device = a.device
          AND c.i >= a.i
         LEFT 
         JOIN 
            ( SELECT x.*
                   , CASE WHEN x.device = @prev THEN @i:=@i+1 ELSE @i:=1 END i
                   , @prev:=device 
                FROM my_table x
                   , (SELECT @i:=1,@prev:=null) vars 
               ORDER 
                  BY device
                   , timestamp
            ) d 
           ON d.device = c.device
          AND d.i = c.i + 1
          AND d.timestamp < c.timestamp + INTERVAL 60 SECOND
        WHERE b.i IS NULL 
          AND c.i IS NOT NULL
          AND d.i IS NULL
        GROUP 
           BY a.device
            , a.i
            ) m
         JOIN my_table n
           ON n.device = m.device
          AND n.timestamp BETWEEN start AND end
        GROUP
           BY m.device 
            , m.start;

+--------+---------------------+---------------------+---------+
| device | start               | end                 | avg     |
+--------+---------------------+---------------------+---------+
| a      | 2015-10-26 22:50:15 | 2015-10-26 22:51:47 | 33.5000 |
| a      | 2015-10-26 22:54:46 | 2015-10-26 22:54:46 | 34.0000 |
| b      | 2015-10-26 22:50:50 | 2015-10-26 22:50:51 | 32.5000 |
| b      | 2015-10-26 22:52:38 | 2015-10-26 22:52:38 | 38.0000 |
+--------+---------------------+---------------------+---------+

Upvotes: 0

Related Questions