Eric M Sheetz
Eric M Sheetz

Reputation: 25

Finidng ranges of values (5 points) across ranges of time (1 hr)

I'm monitoring the effectiveness of a temperature controller in an IoT application. I'm trying to find "interesting points" in the time series. These are similar to local minima or maxima but include points on the curve when a trend is beginning. It's not just min and max. It's where a range of 5 points occurred within an hour.

Source

| inMins | unixTime         | temp |  
|--------|------------------|------|   
| 0      | 1479042000000000 | 10.0 |  
| 5      | 1479042300000000 | 11.0 |  
| 10     | 1479042600000000 | 12.0 |  
| 15     | 1479042600000000 | 13.0 |  
| 20     | 1479043200000000 | 14.0 |  
| 25     | 1479043500000000 | 15.0 |  
| 30     | 1479043800000000 | 14.0 |  
| 35     | 1479044100000000 | 13.0 |  
| 40     | 1479044400000000 | 12.0 |  
| 45     | 1479044700000000 | 11.0 |  
| 50     | 1479045000000000 | 10.0 |  
| 55     | 1479045300000000 | 9.0  |  
| 60     | 1479045600000000 | 8.0  |  
| 65     | 1479045900000000 | 9.0  |  
| 70     | 1479046200000000 | 10.0 |  
| 75     | 1479046500000000 | 11.0 |  
| 80     | 1479046800000000 | 12.0 |  
| 85     | 1479047100000000 | 13.0 |  
| 90     | 1479047400000000 | 14.0 |  

Desired Shape

| inMins | unixTime         | temp | coldOrHot |  
|--------|------------------|------|-----------|  
| 0      | 1479042000000000 | 10.0 | 1         |  
| 25     | 1479043500000000 | 15.0 | 2         |  
| 30     | 1479043800000000 | 14.0 | 2         |
| 35     | 1479044100000000 | 13.0 | 2         |  
| 60     | 1479045600000000 | 8.0  | 1         |  
| 65     | 1479045900000000 | 9.0  | 1         |  

My current results have some problems

| inMins | unixTime         | temp | coldOrHot |  
|--------|------------------|------|-----------|  
| 25     | 1479043500000000 | 15.0 | 2         |  
| 30     | 1479043800000000 | 14.0 | 2         |  
| 60     | 1479045600000000 | 8.0  | 1         |  
| 65     | 1479045900000000 | 9.0  | 1         |  
| 70     | 1479046200000000 | 10.0 | 1         |  
| 75     | 1479046500000000 | 11.0 | 1         |  
| 80     | 1479046800000000 | 12.0 | 1         |  
| 85     | 1479047100000000 | 13.0 | 1         |  
| 90     | 1479047400000000 | 14.0 | 1         |  

SQL

Select 
  inMins,
  unixTime,
  temp,
  coldOrHot
from 
(Select
  inMins,
  unixTime,
  temp,
  -- 1 means Cold, 2 means Hot, 0 is noise
  if(temp=theLowInWindowDesc,1,
  if(temp=theHighInWindowDesc,2,0)) as coldOrHot,
  theHighInWindowDesc,
  theLowInWindowDesc
FROM
  (SELECT
  inMins,
  unixTime,
  temp,
  theHighInWindowDesc,
  theLowInWindowDesc
  FROM
    (Select
        inMins,
        unixTime,
        temp,
        MAX(temp) OVER(ORDER BY
          unixTime desc RANGE BETWEEN 60 * 60 * 1000000 PRECEDING
          AND CURRENT ROW) AS theHighInWindowDesc,
        MIN(temp) OVER(ORDER BY
         unixTime desc RANGE BETWEEN 60 * 60 * 1000000 PRECEDING
         AND CURRENT ROW) AS theLowInWindowDesc
        FROM
        [esheetzbq:findingLocalExtrema.timeSeriesForKevin]
        ORDER BY
        inMins asc
     )
  )
)
where coldOrHot=1 or coldOrHot=2 

Problems

  1. I'm not picking up the "cold" at minute 0 when temp was 10 and increased by 5 points within 25 mins
  2. I'm not picking up the "hot" value at minute 35.
  3. The results from mins 70 to 90 don't take into account my range criteria of 5 points and is occurring because my current logic is based on extremes not ranges. The SQL windowing function of ""OVER"" picks up less than an hour's worth of rows in the final hour of the data set. This is expected behavior, and I'm uncertain what logic is best used to exclude records that give warnings without seeing a range of 5 points.
  4. Will this scale? I'll be running this logic over a record set of about 34M rows.

Upvotes: 2

Views: 104

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173028

Here we go. Below is for BigQuery Standard SQL
I havent done any attempts to improve/optimize query - rather purposely kept it "broken" to subqueries exactly the way i wrote them - to make sure the logic is easily traced and thus understood
I have included below data for easy testing but you can comment out data portion if you want to test it on real data

Have fun :o)

#standardSQL
WITH `esheetzbq.findingLocalExtrema.timeSeriesForKevin` AS (
  SELECT 0 AS inMins, 1479042000000000 AS unixTime, 10.0 AS temp UNION ALL   SELECT 5 AS inMins, 1479042300000000 AS unixTime, 11.0 AS temp UNION ALL   SELECT 10 AS inMins, 1479042600000000 AS unixTime, 12.0 AS temp UNION ALL   SELECT 15 AS inMins, 1479042900000000 AS unixTime, 13.0 AS temp UNION ALL       SELECT 20 AS inMins, 1479043200000000 AS unixTime, 14.0 AS temp UNION ALL   SELECT 25 AS inMins, 1479043500000000 AS unixTime, 15.0 AS temp UNION ALL   SELECT 30 AS inMins, 1479043800000000 AS unixTime, 14.0 AS temp UNION ALL   SELECT 35 AS inMins, 1479044100000000 AS unixTime, 13.0 AS temp UNION ALL
  SELECT 40 AS inMins, 1479044400000000 AS unixTime, 12.0 AS temp UNION ALL   SELECT 45 AS inMins, 1479044700000000 AS unixTime, 11.0 AS temp UNION ALL   SELECT 50 AS inMins, 1479045000000000 AS unixTime, 10.0 AS temp UNION ALL   SELECT 55 AS inMins, 1479045300000000 AS unixTime, 9.0 AS temp UNION ALL       SELECT 60 AS inMins, 1479045600000000 AS unixTime, 8.0 AS temp UNION ALL   SELECT 65 AS inMins, 1479045900000000 AS unixTime, 9.0 AS temp UNION ALL   SELECT 70 AS inMins, 1479046200000000 AS unixTime, 10.0 AS temp UNION ALL   SELECT 75 AS inMins, 1479046500000000 AS unixTime, 11.0 AS temp UNION ALL       SELECT 80 AS inMins, 1479046800000000 AS unixTime, 12.0 AS temp UNION ALL   SELECT 85 AS inMins, 1479047100000000 AS unixTime, 13.0 AS temp UNION ALL   SELECT 90 AS inMins, 1479047400000000 AS unixTime, 14.0 AS temp UNION ALL        SELECT 95 AS inMins, 1479047700000000 AS unixTime, 15 AS temp UNION ALL SELECT 100 AS inMins, 1479048000000000 AS unixTime, 16 AS temp UNION ALL  SELECT 105 AS inMins, 1479048300000000 AS unixTime, 17 AS temp UNION ALL SELECT 110 AS inMins, 1479048600000000 AS unixTime, 18 AS temp UNION ALL 
  SELECT 115 AS inMins, 1479048900000000 AS unixTime, 19 AS temp UNION ALL SELECT 120 AS inMins, 1479049200000000 AS unixTime, 20 AS temp UNION ALL      SELECT 125 AS inMins, 1479049500000000 AS unixTime, 21 AS temp UNION ALL SELECT 130 AS inMins, 1479049800000000 AS unixTime, 22 AS temp UNION ALL       SELECT 135 AS inMins, 1479050100000000 AS unixTime, 23 AS temp UNION ALL SELECT 140 AS inMins, 1479050400000000 AS unixTime, 24 AS temp UNION ALL      SELECT 145 AS inMins, 1479050700000000 AS unixTime, 25 AS temp UNION ALL SELECT 150 AS inMins, 1479051000000000 AS unixTime, 26 AS temp UNION ALL       SELECT 155 AS inMins, 1479051300000000 AS unixTime, 27 AS temp UNION ALL SELECT 160 AS inMins, 1479051600000000 AS unixTime, 28 AS temp UNION ALL      SELECT 165 AS inMins, 1479051900000000 AS unixTime, 29 AS temp UNION ALL SELECT 170 AS inMins, 1479052200000000 AS unixTime, 30 AS temp UNION ALL       SELECT 175 AS inMins, 1479052500000000 AS unixTime, 31 AS temp UNION ALL SELECT 180 AS inMins, 1479052800000000 AS unixTime, 32 AS temp UNION ALL      SELECT 185 AS inMins, 1479053100000000 AS unixTime, 33 AS temp UNION ALL SELECT 190 AS inMins, 1479053400000000 AS unixTime, 34 AS temp UNION ALL       SELECT 195 AS inMins, 1479053700000000 AS unixTime, 35 AS temp UNION ALL SELECT 200 AS inMins, 1479054000000000 AS unixTime, 36 AS temp UNION ALL      SELECT 205 AS inMins, 1479054300000000 AS unixTime, 37 AS temp UNION ALL SELECT 210 AS inMins, 1479054600000000 AS unixTime, 38 AS temp  
), y AS (
  SELECT inMins, unixTime, temp, delta,
    IFNULL(SUM(new_group_flag) OVER(ORDER BY unixTime ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) AS grp
  FROM (
    SELECT inMins, unixTime, temp,
      LEAD(temp) OVER(ORDER BY unixTime) - temp AS delta,
      CAST(SIGN(LEAD(temp) OVER(ORDER BY unixTime) - temp) != IFNULL(SIGN(temp - LAG(temp) OVER(ORDER BY unixTime)), SIGN(LEAD(temp) OVER(ORDER BY unixTime) - temp)) AS INT64) AS new_group_flag
    FROM `esheetzbq.findingLocalExtrema.timeSeriesForKevin`
  )
), yy AS (
  SELECT inMins, unixTime, temp, delta, grp FROM y UNION ALL
  SELECT inMins, unixTime, temp, delta, grp + 1 AS grp
  FROM (
    SELECT inMins, unixTime, temp, delta, grp, 
      unixTime - MAX(unixTime) OVER(PARTITION BY grp ORDER BY unixTime DESC) AS qq
    FROM y
  ) WHERE qq = 0
), v AS (
  SELECT inMins, unixTime, temp, delta,
     MIN(temp) OVER(PARTITION BY grp ORDER BY unixTime RANGE BETWEEN CURRENT ROW AND 3600000000 FOLLOWING) AS min_temp,
     MAX(temp) OVER(PARTITION BY grp ORDER BY unixTime RANGE BETWEEN CURRENT ROW AND 3600000000 FOLLOWING) AS max_temp
  FROM yy
)
SELECT inMins, unixTime, temp, IF(delta=1, 1, 2) AS coldOrHot
FROM v
WHERE  ABS(max_temp - temp) >= 5 OR ABS(min_temp - temp) >= 5

In case you will go this direction check also Enabling Standard SQL and Migrating from legacy SQL for more details if needed

Upvotes: 2

Related Questions