Reputation: 21
I have a table with the following schema:
uid | visit name | visit date | sales quantity |
---|---|---|---|
xyz | visit 1 | 2020-01-01 | 29 |
xyz | visit 2 | 2020-01-03 | 250 |
xyz | visit 3 | 2020-01-04 | 20 |
xyz | visit 4 | 2020-01-27 | 21 |
abc | visit 1 | 2020-02-01 | 29 |
abc | visit 2 | 2020-03-03 | 34 |
abc | visit 3 | 2020-04-04 | 35 |
abc | visit 4 | 2020-04-27 | 41 |
Each unique id has a few unique visits that repeat for every unique id, at every visit I have to calculate what the two most highest sales quantity is per user- across their prior visits(ascending order) up until the current visit named in the row for each unique id and excluding the current row.
output would be- the same table plus these columns
max sale | 2nd max sale | avg of both max sales |
---|
I have used window functions for the maximum value, but I am struggling to get the second highest value of sales for every user for every row. Is this doable using sql? If so what would the script look like?
Upvotes: 2
Views: 735
Reputation: 59165
Update: I re-wrote my answer, because the previous one ignored certain requirements.
To keep track of the 2 previous top values, you can write a UDTF in JS to hold that ranking:
create or replace function udtf_top2_before(points float)
returns table (output_col array)
language javascript
as $$
{
processRow: function f(row, rowWriter, context){
rowWriter.writeRow({OUTPUT_COL: this.prevmax.slice().reverse()});
this.prevmax.push(row.POINTS);
// silly js sort https://stackoverflow.com/a/21595293/132438
this.prevmax = this.prevmax.sort(function (a, b) {return a - b;}).slice(-2);
}
, initialize: function(argumentInfo, context) {
this.prevmax = [];
}
}
$$;
Then that tabular UDF can will give you the numbers as expected:
with data as (
select v:author::string author, v:score::int score, v:subreddit, v:created_utc::timestamp ts
from reddit_comments_sample
where v:subreddit = 'wallstreetbets'
)
select author, score, ts
, output_col[0] prev_max
, output_col[1] prev_max2
, (prev_max+ifnull(prev_max2,prev_max))/2 avg
from (
select author, score, ts, output_col
from data, table(udtf_top2_before(score::float) over(partition by author order by ts))
order by author, ts
limit 100
)
UDTF based on my previous post:
Previously:
You can use row_number() over()
to select the top 2, and then pivot with an array_agg()
:
with data as (
select v:author author, v:score::int score, v:subreddit, v:created_utc::timestamp ts
from reddit_comments_sample
where v:subreddit = 'wallstreetbets'
)
select author, arr[0] max_score, arr[1] max_score_2, (max_score+ifnull(max_score_2,max_score))/2 avg
from (
select author
, array_agg(score) within group (order by score::int desc) arr
from (
select author, score, ts
from data
qualify row_number() over(partition by author order by score desc) <= 2
)
group by 1
)
order by 4 desc
Upvotes: 1