Alexey Zalyotov
Alexey Zalyotov

Reputation: 494

PostgreSQL / TimescaleDB percentile in partial mode

Let's say I have a results table:

result_id attr_id user_id value timestamp
1 1 1 100 2024-02-10 14:30:15.248087+00
2 2 1 111 2024-02-10 10:30:15.248087+00
3 1 1 122 2024-02-09 14:30:15.248087+00
4 2 1 162 2024-02-08 10:30:15.248087+00
5 1 2 119 2024-02-10 14:30:15.248087+00
6 2 2 128 2024-02-10 10:30:15.248087+00
7 1 2 137 2024-02-09 14:30:15.248087+00
8 2 2 146 2024-02-08 10:30:15.248087+00

I need to calculate percentile for every row over partition by user_id and attr_id preceding current row and only in 10 days interval. I can calculate standard deviation in this way because it supports partial mode:

SELECT
  stddev(value) OVER (
    PARTITION BY user_id, attr_id
    ORDER BY timestamp ASC RANGE BETWEEN '10 days'::interval PRECEDING AND CURRENT ROW
    EXCLUDE CURRENT ROW
  ) AS stddev_efficiency
FROM results;

Is there a way to calculate percentile in postgresql/timescaleDB in accordance with the described requirements?

Upvotes: 0

Views: 78

Answers (1)

jjanes
jjanes

Reputation: 44305

Presumably the problem is with percentile_cont (and friends) being an "ordered set aggregate" and so being ineligible to be used with/as a window function.

You can find a discussion of work-arounds for that issue on the PostgreSQL wiki. Basically you would implement your own aggregate function which is not an "ordered set aggregate" and so keeps the state in memory the normal way instead, but then can be used as window functions. You will probably need to do this in C to get good performance. You might need to deal with memory issues as well, but there isn't enough information here to advice further on that.

Upvotes: 0

Related Questions