Victor
Victor

Reputation: 77

Window functions to smooth your missing data in postgresql

OLD title: way to calculate values for a sequence in a postgresql and greenplum

I have a table with two columns, say txn_dt and txn_amt. For example:

|txn_dt|txn_amt|
|----|----|
|2016-01-01|12.0|
|2016-01-02|0.|
|2016-01-03|0.|
|2016-01-04|0.|
|2016-01-05|0.|
|2016-01-06|10.0|
|2016-01-07|0.0|
|2016-01-08|20.0|
|2016-01-09|0.|
|2016-01-10|0.|
|2016-01-11|0.|
|2016-01-12|0.|
... ALL ZEROS ...
|2016-01-20|0.|
|2016-01-21|0.|
|2016-01-22|0.|
|2016-01-23|0.|
|2016-01-24|12.0|

My intention is to obtain the following values saved:

|txn_dt|txn_amt|
|----|----|
|2016-01-01|12.0|
|2016-01-02|6.|
|2016-01-03|3. + 1.25|
|2016-01-04|1.5 + 2.5|
|2016-01-05|5.|
|2016-01-06|10.0|
|2016-01-07|5.0 + 10.0|
|2016-01-08|20.0|
|2016-01-09|10.|
|2016-01-10|5.|
|2016-01-11|2.5|
|2016-01-12|0.|
... ALL ZEROS ...
|2016-01-20|0.|
|2016-01-21|1.5|
|2016-01-22|3.|
|2016-01-23|6.|
|2016-01-24|12.0|

Let's say A[0 to 23] = [12,0,0,0,0,10,0,20,0,...,0,12] representing the above values from the txn_amt. I need to set the window size to 3, then I will calculate like

A[1] = A[0]/2
A[2] = A[1]/2
A[3] = A[2]/2

that means

A[i+1] = 0.5*A[i], if A[i+1]==0

We only change up to 3 zero values by the above formula. That is to say, if A[i]>0, A[i+1]=0, but A[i+2]>0, then A[i+1]=0.5*A[i], A[i+2] will not be changed.

After that, I also need to reverse this array A[0 to 23] to B[0 to 23] where B[0]==A[23],...,B[23]==A[0], and do the above procedure again; then add values point by point in both arrays.

My question is how can I use postgresql to implement it? Can someone please help?

Thanks to Gordon, I updated the code as follows:

select t.*,
       cast((case when txn_amt = 0 and prev_1 = 0 and prev_2 = 0
              then prev_3 / 8
              when txn_amt = 0 and prev_1 = 0
              then prev_2 / 4
              when txn_amt = 0
              then prev_1 / 2
              else 0
         end) +
        (case when txn_amt = 0 and next_1 = 0 and next_2 = 0
              then next_3 / 8
              when txn_amt = 0 and next_1 = 0
              then next_2 / 4
              when txn_amt = 0
              then next_1 / 2
              else 0
         end) +
        txn_amt
       as numeric(18,2)) as new_txn_amt
from (select b.txn_dt, a.txn_amt,
             COALESCE( lag(txn_amt) over (order by b.txn_dt),0.0) as prev_1,
             COALESCE( lag(txn_amt, 2) over (order by b.txn_dt),0.0) as prev_2,
             COALESCE( lag(txn_amt, 3) over (order by b.txn_dt),0.0) as prev_3,
             COALESCE( lead(txn_amt) over (order by b.txn_dt),0.0) as next_1,
             COALESCE( lead(txn_amt, 2) over (order by b.txn_dt),0.0) as next_2,
             COALESCE( lead(txn_amt, 3) over (order by b.txn_dt),0.0) as next_3
      from test_txns a
      right join (select * from generate_series('2016-01-01'::date,'2016-01-31'::date,'1day') as txn_dt) b
      on a.txn_dt = b.txn_dt
      order by b.txn_dt
     ) t;

Upvotes: 0

Views: 253

Answers (2)

santosvella
santosvella

Reputation: 21

A bit late i know. I found your solution okay but a bit of a pain to edit easily. I find using AVG() , PRECEDING and FOLLOWING does the same job with less code

    SELECT  
        txn_dt ,
        AVG(txn_amt) OVER(ORDER BY txn_dt desc ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) AS avg_txn_amt 
    FROM test_txns 
    ORDER BY txn_dt 

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270463

This seems like a painful calculation, but you can use lead() and lag() and a bunch of arithmetic:

select t.*,
       ((case when txn_amt = 0 and prev_1 = 0 and prev_2 = 0
              then prev_3 / 8
              when txn_amt = 0 and prev_1 = 0
              then prev_2 / 4
              when txn_amt = 0 
              then prev_1 / 2
              else 0
         end) +
        (case when txn_amt = 0 and next_1 = 0 and next_2 = 0
              then next_3 / 8
              when txn_amt = 0 and next_1 = 0
              then next_2 / 4
              when txn_amt = 0 
              then next_1 / 2
              else 0
         end) +
        txn_amt
       ) as new_txn_amt        
from (select t.*,
             lag(txn_amount) over (order by txn_dt) as prev_1,
             lag(txn_amount, 2) over (order by txn_dt) as prev_2,
             lag(txn_amount, 3) over (order by txn_dt) as prev_3,
             lead(txn_amount) over (order by txn_dt) as next_1,
             lead(txn_amount, 2) over (order by txn_dt) as next_2,
             lead(txn_amount, 3) over (order by txn_dt) as next_3
      from t
     ) t

Upvotes: 1

Related Questions