zzz247
zzz247

Reputation: 107

Query data from previous row and update current row

I have a table on which some data is missing, so I have to replace the missing data by the data of the day before.

I wanted to do a SQL update to solve this issue.

In case the following criteria is met -> if ID1 is in set(a1,a2,a3) AND TYPE Is MISSING

The variables "Amount" a/b will have an absurd value

We will then take the Amount a/b value of the row of the previous day, where ID1 and ID2 are the same as the one of the row where the criteria is met.

So here ID1 and ID2 are equal to a1 and b1 respectively, we look for a1 and b1 the previous day (10/03/2021) and get the amount 28.45/29.46 that we use to replace the false amount 454848.25/548926.36.

We also copy Type value.

ID1 ID2 Amount a Amount b day Type
a1 b1 28.45 29.46 10/03/2021 Out
a2 b1 36.84 37.88 10/03/2021 In
a1 b1 454848.25 548926.36 11/03/2021 /MISSING/

Goal:

ID1 ID2 Amount a Amount b day Type
a1 b1 28.45 29.46 10/03/2021 Out
a2 b1 36.84 37.88 10/03/2021 In
a1 b1 28.45 29.46 11/03/2021 Out

My table consists of thousands of rows, but it's the idea

I tried to use lag and SQL update, but did not succeed.

Upvotes: 0

Views: 723

Answers (3)

data _null_
data _null_

Reputation: 9109

This looks like simple LOCF. Thanks @kermit for data HAVE.

data have;
   infile datalines delimiter='|';
   input ID1 $ ID2 $ Amount_A Amount_B day :ddmmyy10. type $;
   format day ddmmyy10.;
   datalines;
a1|b1|28.45|29.46|10/03/2021|Out
a2|b1|36.84|37.88|10/03/2021|In
a1|b1|454848.25|548926.36|11/03/2021|
;;;;
   run;
proc sort data=have;
   by id1 id2;
   run;
data LOCF;
   update have(obs=0) have;
   by id1 id2;
   output;
   run;
proc print;
   run;

enter image description here

Upvotes: 1

deristnochda
deristnochda

Reputation: 585

Why not just sort by the keys and date and then use the lag function to look one row back. Maybe you already tried this, but used the lag only inside the if type is missing block. This won't help you as noted in the documentation

Storing values at the bottom of the queue and returning values from the top of the queue occurs only when the function is executed. An occurrence of the LAG n function that is executed conditionally stores and return values only from the observations for which the condition is satisfied.

Instead, compute the lag in every row, not just in those satisfying the condition.

proc sort data=have;
  by ID1 ID2 day;
run;

data want;
  set have;
  by ID1 ID2;
  lag_amount_a = lag(amount_a);
  lag_amount_b = lag(amount_b);
  lag_day = lag(day);
  lag_type = lag(type);
  if ID1 in ("a1", "a2", "a3") and missing(type) then do;
    // check if row before matches ID1, ID2 and day - 1
    if not first.ID2 and day = lag_day + 1 then do;
      amount_a = lag_amount_a;
      amount_b = lag_amount_b;
      type = lag_type;
    end;
  end;
run;

Upvotes: 1

Kermit
Kermit

Reputation: 3117

If the condition is met, replace absurd values by proper missing values and assign a pointer to the day before. Then lookup using the initial table.

data have;
infile datalines delimiter='|';
input ID1 $ ID2 $ Amount_A Amount_B day :ddmmyy10. type $;
format day ddmmyy10.;
datalines;
a1|b1|28.45|29.46|10/03/2021|Out
a2|b1|36.84|37.88|10/03/2021|In
a1|b1|454848.25|548926.36|11/03/2021|
;

data stage1;
set have;
if ID1 in ('a1','a2','a3') and type = "" then do;
    Amount_A = .;
    Amount_B = .;
    _date = day - 1;
    type = "Out";
end;
format _date ddmmyy10.;
run;

data want;
    if 0 then set have;
    if _n_ = 1 then do;
        declare hash h(dataset:'have');
        h.definekey('ID1','ID2','day');
        h.definedata('Amount_A','Amount_B');
        h.definedone();
    end;
    
    set stage1;
    rc = h.find(key:ID1, key:ID2, key: _date);
    drop rc _date;
run;
ID1 ID2 Amount_A Amount_B    day      type
a1  b1  28.45     29.46   10/03/2021  Out
a2  b1  36.84     37.88   10/03/2021  In
a1  b1  28.45     29.46   11/03/2021  Out

Upvotes: 1

Related Questions