1RB
1RB

Reputation: 23

Clickhouse: Sliding / moving window

I'm looking for an efficient way on how to query the n past values as array in ClickHouse for each row ordered by one column (i.e. Time), where the values should be retrieved as array.

Window functions are still not supported in ClickHouse (see #1469), so I was hoping for a work-around using aggregation functions like groupArray()?

Table:

Time  | Value
12:11 | 1
12:12 | 2
12:13 | 3
12:14 | 4
12:15 | 5
12:16 | 6

Expected result with a window of size n=3:

Time  | Value
12:13 | [1,2,3]
12:14 | [2,3,4]
12:15 | [3,4,5]
12:16 | [4,5,6]

What are the ways/functions currently used in ClickHouse to efficiently query a sliding/moving window and how can I achieve my desired result?

EDIT:

My solution based on response of @vladimir:

select max(Time) as Time, groupArray(Value) as Values
from (
    select
        *,
        rowNumberInAllBlocks() as row_number,
        arrayJoin(range(row_number, row_number + 3)) as window_id
    from (
        /* BEGIN emulate origin dataset */
        select toDateTime(a) as Time, rowNumberInAllBlocks()+1 as Value
        from (
            select arrayJoin([
                '2020-01-01 12:11:00',
                '2020-01-01 12:12:00',
                '2020-01-01 12:13:00',
                '2020-01-01 12:14:00',
                '2020-01-01 12:15:00',
                '2020-01-01 12:16:00']) a
        )
        order by Time
        /* END emulate origin dataset */
    )
    order by Time
) s
group by window_id
having length(Values) = 3
order by Time

Note that 3 appears twice in the query and represents the window size n.

Output:

┌────────────────Time─┬─Values──┐
│ 2020-01-01 12:13:00 │ [1,2,3] │
│ 2020-01-01 12:14:00 │ [2,3,4] │
│ 2020-01-01 12:15:00 │ [3,4,5] │
│ 2020-01-01 12:16:00 │ [4,5,6] │
└─────────────────────┴─────────┘

Upvotes: 2

Views: 4060

Answers (2)

Nab
Nab

Reputation: 1

For Clickhouse 19, where range function takes only single input, you can use following query

select max(Time) as Time, groupArray(Value) as Values
from (
select
        *,
        rowNumberInAllBlocks() as row_number,
        arrayJoin( arrayMap(x -> x + row_number, range(3)) ) as window_id
    from (
        /* BEGIN emulate origin dataset */
        select toDateTime(a) as Time, rowNumberInAllBlocks()+1 as Value
        from (
            select arrayJoin([
                '2020-01-01 12:11:00',
                '2020-01-01 12:12:00',
                '2020-01-01 12:13:00',
                '2020-01-01 12:14:00',
                '2020-01-01 12:15:00',
                '2020-01-01 12:16:00']) a
        )
        order by Time
        /* END emulate origin dataset */
    )
order by Time
) s
group by window_id
having length(Values) = 3
order by Time

Upvotes: 0

vladimir
vladimir

Reputation: 15226

Starting from version 21.4 added the full support of window-functions. At this moment it was marked as an experimental feature.

SELECT
    Time,
    groupArray(any(Value)) OVER (ORDER BY Time ASC ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS Values
FROM 
(
    /* Emulate the test dataset, */
    select toDateTime(a) as Time, rowNumberInAllBlocks()+1 as Value
    from (
        select arrayJoin([
            '2020-01-01 12:11:00',
            '2020-01-01 12:12:00',
            '2020-01-01 12:13:00',
            '2020-01-01 12:14:00',
            '2020-01-01 12:15:00',
            '2020-01-01 12:16:00']) a
    )
    order by Time
)
GROUP BY Time
SETTINGS allow_experimental_window_functions = 1

/*
┌────────────────Time─┬─Values──┐
│ 2020-01-01 12:11:00 │ [1]     │
│ 2020-01-01 12:12:00 │ [1,2]   │
│ 2020-01-01 12:13:00 │ [1,2,3] │
│ 2020-01-01 12:14:00 │ [2,3,4] │
│ 2020-01-01 12:15:00 │ [3,4,5] │
│ 2020-01-01 12:16:00 │ [4,5,6] │
└─────────────────────┴─────────┘
*/

See https://altinity.com/blog/clickhouse-window-functions-current-state-of-the-art.


ClickHouse has several datablock-scoped window functions, let's take neighbor:

SELECT Time, [neighbor(Value, -2), neighbor(Value, -1), neighbor(Value, 0)] Values
FROM (
  /* emulate origin data */
  SELECT toDateTime(data.1) as Time, data.2 as Value
  FROM (
    SELECT arrayJoin([('2020-01-01 12:11:00', 1),
    ('2020-01-01 12:12:00', 2),
    ('2020-01-01 12:13:00', 3),
    ('2020-01-01 12:14:00', 4),
    ('2020-01-01 12:15:00', 5),
    ('2020-01-01 12:16:00', 6)]) as data)
  )

/*
┌────────────────Time─┬─Values──┐
│ 2020-01-01 12:11:00 │ [0,0,1] │
│ 2020-01-01 12:12:00 │ [0,1,2] │
│ 2020-01-01 12:13:00 │ [1,2,3] │
│ 2020-01-01 12:14:00 │ [2,3,4] │
│ 2020-01-01 12:15:00 │ [3,4,5] │
│ 2020-01-01 12:16:00 │ [4,5,6] │
└─────────────────────┴─────────┘

*/

An alternate way based on the duplication of source rows by window_size times:

SELECT   
  arrayReduce('max', arrayMap(x -> x.1, raw_result)) Time,
  arrayMap(x -> x.2, raw_result) Values
FROM (  
  SELECT groupArray((Time, Value)) raw_result, max(row_number) max_row_number
  FROM (
    SELECT 
      3 AS window_size,
      *, 
      rowNumberInAllBlocks() row_number,
      arrayJoin(arrayMap(x -> x + row_number, range(window_size))) window_id
    FROM (
      /* emulate origin dataset */
      SELECT toDateTime(data.1) as Time, data.2 as Value
      FROM (
        SELECT arrayJoin([('2020-01-01 12:11:00', 1),
          ('2020-01-01 12:12:00', 2),
          ('2020-01-01 12:13:00', 3),
          ('2020-01-01 12:14:00', 4),
          ('2020-01-01 12:15:00', 5),
          ('2020-01-01 12:16:00', 6)]) as data)
      ORDER BY Value
      )
    )
  GROUP BY window_id
  HAVING max_row_number = window_id
  ORDER BY window_id
  )
/*
┌────────────────Time─┬─Values──┐
│ 2020-01-01 12:11:00 │ [1]     │
│ 2020-01-01 12:12:00 │ [1,2]   │
│ 2020-01-01 12:13:00 │ [1,2,3] │
│ 2020-01-01 12:14:00 │ [2,3,4] │
│ 2020-01-01 12:15:00 │ [3,4,5] │
│ 2020-01-01 12:16:00 │ [4,5,6] │
└─────────────────────┴─────────┘
*/

Extra example:

SELECT   
  arrayReduce('max', arrayMap(x -> x.1, raw_result)) id,
  arrayMap(x -> x.2, raw_result) values
FROM (  
  SELECT groupArray((id, value)) raw_result, max(row_number) max_row_number
  FROM (
    SELECT 
      48 AS window_size,
      *, 
      rowNumberInAllBlocks() row_number,
      arrayJoin(arrayMap(x -> x + row_number, range(window_size))) window_id
    FROM (
      /* the origin dataset */
      SELECT number AS id, number AS value
      FROM numbers(4096) 
      )
    )
  GROUP BY window_id
  HAVING max_row_number = window_id
  ORDER BY window_id
  )
/*
┌─id─┬─values────────────────┐
│  0 │ [0]                   │
│  1 │ [0,1]                 │
│  2 │ [0,1,2]               │
│  3 │ [0,1,2,3]             │
│  4 │ [0,1,2,3,4]           │
│  5 │ [0,1,2,3,4,5]         │
│  6 │ [0,1,2,3,4,5,6]       │
│  7 │ [0,1,2,3,4,5,6,7]     │
..
│ 56 │ [9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56]  │
│ 57 │ [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57] │
│ 58 │ [11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58] │
│ 59 │ [12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59] │
│ 60 │ [13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60] │
..
│ 4093 │ [4046,4047,4048,4049,4050,4051,4052,4053,4054,4055,4056,4057,4058,4059,4060,4061,4062,4063,4064,4065,4066,4067,4068,4069,4070,4071,4072,4073,4074,4075,4076,4077,4078,4079,4080,4081,4082,4083,4084,4085,4086,4087,4088,4089,4090,4091,4092,4093] │
│ 4094 │ [4047,4048,4049,4050,4051,4052,4053,4054,4055,4056,4057,4058,4059,4060,4061,4062,4063,4064,4065,4066,4067,4068,4069,4070,4071,4072,4073,4074,4075,4076,4077,4078,4079,4080,4081,4082,4083,4084,4085,4086,4087,4088,4089,4090,4091,4092,4093,4094] │
│ 4095 │ [4048,4049,4050,4051,4052,4053,4054,4055,4056,4057,4058,4059,4060,4061,4062,4063,4064,4065,4066,4067,4068,4069,4070,4071,4072,4073,4074,4075,4076,4077,4078,4079,4080,4081,4082,4083,4084,4085,4086,4087,4088,4089,4090,4091,4092,4093,4094,4095] │
└──────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
*/

Upvotes: 2

Related Questions