Reputation: 77
OLD title: way to calculate values for a sequence in a postgresql and greenplum
I have a table with two columns, say txn_dt and txn_amt. For example:
|txn_dt|txn_amt| |----|----| |2016-01-01|12.0| |2016-01-02|0.| |2016-01-03|0.| |2016-01-04|0.| |2016-01-05|0.| |2016-01-06|10.0| |2016-01-07|0.0| |2016-01-08|20.0| |2016-01-09|0.| |2016-01-10|0.| |2016-01-11|0.| |2016-01-12|0.| ... ALL ZEROS ... |2016-01-20|0.| |2016-01-21|0.| |2016-01-22|0.| |2016-01-23|0.| |2016-01-24|12.0|
My intention is to obtain the following values saved:
|txn_dt|txn_amt| |----|----| |2016-01-01|12.0| |2016-01-02|6.| |2016-01-03|3. + 1.25| |2016-01-04|1.5 + 2.5| |2016-01-05|5.| |2016-01-06|10.0| |2016-01-07|5.0 + 10.0| |2016-01-08|20.0| |2016-01-09|10.| |2016-01-10|5.| |2016-01-11|2.5| |2016-01-12|0.| ... ALL ZEROS ... |2016-01-20|0.| |2016-01-21|1.5| |2016-01-22|3.| |2016-01-23|6.| |2016-01-24|12.0|
Let's say A[0 to 23] = [12,0,0,0,0,10,0,20,0,...,0,12]
representing the above values from the txn_amt. I need to set the window size to 3, then I will calculate like
A[1] = A[0]/2
A[2] = A[1]/2
A[3] = A[2]/2
that means
A[i+1] = 0.5*A[i], if A[i+1]==0
We only change up to 3 zero values by the above formula. That is to say, if A[i]>0, A[i+1]=0, but A[i+2]>0, then A[i+1]=0.5*A[i], A[i+2] will not be changed.
After that, I also need to reverse this array A[0 to 23]
to B[0 to 23]
where B[0]==A[23],...,B[23]==A[0]
, and do the above procedure again; then add values point by point in both arrays.
My question is how can I use postgresql to implement it? Can someone please help?
Thanks to Gordon, I updated the code as follows:
select t.*,
cast((case when txn_amt = 0 and prev_1 = 0 and prev_2 = 0
then prev_3 / 8
when txn_amt = 0 and prev_1 = 0
then prev_2 / 4
when txn_amt = 0
then prev_1 / 2
else 0
end) +
(case when txn_amt = 0 and next_1 = 0 and next_2 = 0
then next_3 / 8
when txn_amt = 0 and next_1 = 0
then next_2 / 4
when txn_amt = 0
then next_1 / 2
else 0
end) +
txn_amt
as numeric(18,2)) as new_txn_amt
from (select b.txn_dt, a.txn_amt,
COALESCE( lag(txn_amt) over (order by b.txn_dt),0.0) as prev_1,
COALESCE( lag(txn_amt, 2) over (order by b.txn_dt),0.0) as prev_2,
COALESCE( lag(txn_amt, 3) over (order by b.txn_dt),0.0) as prev_3,
COALESCE( lead(txn_amt) over (order by b.txn_dt),0.0) as next_1,
COALESCE( lead(txn_amt, 2) over (order by b.txn_dt),0.0) as next_2,
COALESCE( lead(txn_amt, 3) over (order by b.txn_dt),0.0) as next_3
from test_txns a
right join (select * from generate_series('2016-01-01'::date,'2016-01-31'::date,'1day') as txn_dt) b
on a.txn_dt = b.txn_dt
order by b.txn_dt
) t;
Upvotes: 0
Views: 253
Reputation: 21
A bit late i know. I found your solution okay but a bit of a pain to edit easily. I find using AVG() , PRECEDING and FOLLOWING does the same job with less code
SELECT
txn_dt ,
AVG(txn_amt) OVER(ORDER BY txn_dt desc ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) AS avg_txn_amt
FROM test_txns
ORDER BY txn_dt
Upvotes: 1
Reputation: 1270463
This seems like a painful calculation, but you can use lead()
and lag()
and a bunch of arithmetic:
select t.*,
((case when txn_amt = 0 and prev_1 = 0 and prev_2 = 0
then prev_3 / 8
when txn_amt = 0 and prev_1 = 0
then prev_2 / 4
when txn_amt = 0
then prev_1 / 2
else 0
end) +
(case when txn_amt = 0 and next_1 = 0 and next_2 = 0
then next_3 / 8
when txn_amt = 0 and next_1 = 0
then next_2 / 4
when txn_amt = 0
then next_1 / 2
else 0
end) +
txn_amt
) as new_txn_amt
from (select t.*,
lag(txn_amount) over (order by txn_dt) as prev_1,
lag(txn_amount, 2) over (order by txn_dt) as prev_2,
lag(txn_amount, 3) over (order by txn_dt) as prev_3,
lead(txn_amount) over (order by txn_dt) as next_1,
lead(txn_amount, 2) over (order by txn_dt) as next_2,
lead(txn_amount, 3) over (order by txn_dt) as next_3
from t
) t
Upvotes: 1