Reputation: 107
I have a table on which some data is missing, so I have to replace the missing data by the data of the day before.
I wanted to do a SQL update to solve this issue.
In case the following criteria is met -> if ID1 is in set(a1,a2,a3) AND TYPE Is MISSING
The variables "Amount" a/b will have an absurd value
We will then take the Amount a/b value of the row of the previous day, where ID1 and ID2 are the same as the one of the row where the criteria is met.
So here ID1 and ID2 are equal to a1 and b1 respectively, we look for a1 and b1 the previous day (10/03/2021) and get the amount 28.45/29.46 that we use to replace the false amount 454848.25/548926.36.
We also copy Type value.
ID1 | ID2 | Amount a | Amount b | day | Type |
---|---|---|---|---|---|
a1 | b1 | 28.45 | 29.46 | 10/03/2021 | Out |
a2 | b1 | 36.84 | 37.88 | 10/03/2021 | In |
a1 | b1 | 454848.25 | 548926.36 | 11/03/2021 | /MISSING/ |
Goal:
ID1 | ID2 | Amount a | Amount b | day | Type |
---|---|---|---|---|---|
a1 | b1 | 28.45 | 29.46 | 10/03/2021 | Out |
a2 | b1 | 36.84 | 37.88 | 10/03/2021 | In |
a1 | b1 | 28.45 | 29.46 | 11/03/2021 | Out |
My table consists of thousands of rows, but it's the idea
I tried to use lag and SQL update, but did not succeed.
Upvotes: 0
Views: 723
Reputation: 9109
This looks like simple LOCF. Thanks @kermit for data HAVE.
data have;
infile datalines delimiter='|';
input ID1 $ ID2 $ Amount_A Amount_B day :ddmmyy10. type $;
format day ddmmyy10.;
datalines;
a1|b1|28.45|29.46|10/03/2021|Out
a2|b1|36.84|37.88|10/03/2021|In
a1|b1|454848.25|548926.36|11/03/2021|
;;;;
run;
proc sort data=have;
by id1 id2;
run;
data LOCF;
update have(obs=0) have;
by id1 id2;
output;
run;
proc print;
run;
Upvotes: 1
Reputation: 585
Why not just sort by the keys and date and then use the lag
function to look one row back. Maybe you already tried this, but used the lag
only inside the if type is missing
block. This won't help you as noted in the documentation
Storing values at the bottom of the queue and returning values from the top of the queue occurs only when the function is executed. An occurrence of the LAG n function that is executed conditionally stores and return values only from the observations for which the condition is satisfied.
Instead, compute the lag in every row, not just in those satisfying the condition.
proc sort data=have;
by ID1 ID2 day;
run;
data want;
set have;
by ID1 ID2;
lag_amount_a = lag(amount_a);
lag_amount_b = lag(amount_b);
lag_day = lag(day);
lag_type = lag(type);
if ID1 in ("a1", "a2", "a3") and missing(type) then do;
// check if row before matches ID1, ID2 and day - 1
if not first.ID2 and day = lag_day + 1 then do;
amount_a = lag_amount_a;
amount_b = lag_amount_b;
type = lag_type;
end;
end;
run;
Upvotes: 1
Reputation: 3117
If the condition is met, replace absurd values by proper missing values and assign a pointer to the day before. Then lookup using the initial table.
data have;
infile datalines delimiter='|';
input ID1 $ ID2 $ Amount_A Amount_B day :ddmmyy10. type $;
format day ddmmyy10.;
datalines;
a1|b1|28.45|29.46|10/03/2021|Out
a2|b1|36.84|37.88|10/03/2021|In
a1|b1|454848.25|548926.36|11/03/2021|
;
data stage1;
set have;
if ID1 in ('a1','a2','a3') and type = "" then do;
Amount_A = .;
Amount_B = .;
_date = day - 1;
type = "Out";
end;
format _date ddmmyy10.;
run;
data want;
if 0 then set have;
if _n_ = 1 then do;
declare hash h(dataset:'have');
h.definekey('ID1','ID2','day');
h.definedata('Amount_A','Amount_B');
h.definedone();
end;
set stage1;
rc = h.find(key:ID1, key:ID2, key: _date);
drop rc _date;
run;
ID1 ID2 Amount_A Amount_B day type
a1 b1 28.45 29.46 10/03/2021 Out
a2 b1 36.84 37.88 10/03/2021 In
a1 b1 28.45 29.46 11/03/2021 Out
Upvotes: 1