Reputation: 6147
Is it possible to merge below two tables using hash object in SAS 9.1 example below ? The main problemseems to be creation of Value
variable w Result dataset. Problem is that each payment could pay for more than one charge, and sometimes more than one payment is need to pay for one charge and this tho cases could appear simultaneously. Does it problem has some general name ?
http://support.sas.com/rnd/base/datastep/dot/hash-getting-started.pdf
data TABLE1;
input ID_client ID_commodity Charge;
datalines;
1 111111111 100
1 222222222 200
2 333333333 300
2 444444444 400
2 555555555 500
;;;;
run;
data TABLE2;
input ID_client_hash ID_ofpayment paymentValue;
datalines;
1 11 50
1 12 50
1 13 100
1 14 50
1 15 50
2 21 500
2 22 200
2 23 100
2 24 200
2 25 200
;;;;
run;
data OUT;
input ID_client ID_commodity ID_ofpayment value;
datalines;
1 111111111 11 50
1 111111111 12 50
1 222222222 13 100
1 222222222 14 50
1 222222222 15 50
2 333333333 21 300
2 444444444 21 200
2 444444444 22 200
2 555555555 23 100
2 555555555 24 200
2 555555555 25 200
Upvotes: 0
Views: 752
Reputation: 63424
This might work for you - I have 9.2 and 9.2 has some significant hash improvements, but I think I behaved myself and only used what was there in 9.1. You might try crossposting this to SAS-L [SAS listserv] as Paul Dorfman (ie, The Hash Guru) reads that still I believe.
I assumed you want the 'leftovers' posted out. You may need to work on that part, if it's not working the way you want. This isn't terribly well tested, it works for your example dataset. I call missing the commodity for 24 and 25 since they're not used for that.
I'm pretty sure there's a more clean way to do the iteration than what I do, but since 9.2+ is what I use and we have multidata available, i've always used that instead of hash iterators so I don't know the cleaner methods.
data have;
input ID_client ID_commodity Charge;
datalines;
1 111111111 100
1 222222222 200
2 333333333 300
2 444444444 400
2 555555555 50
;;;;
run;
data for_hash;
input ID_client_hash ID_ofpayment paymentValue;
datalines;
1 11 50
1 12 50
1 13 100
1 14 50
1 15 50
2 21 500
2 22 200
2 23 100
2 24 200
2 25 200
;;;;
run;
data want;
*Create hash and hash iterator - must use iterator since 9.1 does not allow multidata option;
if _n_ = 1 then do;
format id_client_hash paymentValue id_ofpayment BEST12.;
declare hash h(dataset:'for_hash' , ordered: 'a');
h.defineKey('ID_client_hash','id_ofpayment'); *note I put id_client_hash, renaming the id - want to be able to compare them;
h.defineData('id_client_hash','id_ofpayment','paymentValue');
call missing(id_ofpayment,paymentValue, id_client_hash);
h.defineDone();
declare hiter hi('h');
end;
do _t = 1 by 1 until (last.id_client);
set have;
by id_client;
*Iterate through the hash and find the first record with the same ID_client;
do rc = hi.first() by 0 while (rc eq 0 and ID_client ne ID_client_hash);
rc = hi.next();
end;
*For the current charge record, iterate through the payment (hash) until all paid up.;
do while (charge gt 0 and rc eq 0 and ID_client=ID_client_hash);
if charge ge paymentValue then do; *If charge >= paymentvalue, use up the payment value;
value = paymentValue; *so whole paymentValue is value;
charge = charge - paymentValue; *charge is decremented by paymentValue;
output; *output row;
_id=ID_client_hash;
_pay=id_ofpayment;
rc = hi.next();
h.remove(key:_id,key:_pay); *remove payment row from hash now that it has been used up;
end;
else do; *this is if (remaining) charge is less than payment - we will not use all of the payment;
value = charge; *value is the remainder of the charge, ie, how much of payment was actually used;
paymentValue = paymentValue - charge; *paymentValue is the remainder of paymentValue;
charge= 0; *charge is zero now;
output; *output a row;
h.replace(); *replace paymentValue in the hash with the new value of paymentValue, minus charge;
end;
end; *end of iteration through hash - at this point, either charge = 0 or we have run out of payments with that ID;
if charge gt 0 then do;
value=-1*charge;
call missing(id_ofpayment);
output; *output a row for the charge, which is not paid;
end;
if last.id_client then do; *this is cleanup, checking to see if we have any leftover payments;
do while (rc=0); *iterate through the remaining hash;
do rc = hi.first() by 0 while (rc eq 0 and ID_client ne ID_client_hash);
rc = hi.next();
end;
if rc=0 then do;
call missing(id_commodity); *to make it clear this is a leftover payment;
value=paymentValue; *update the value;
output; *output the payment;
_id=ID_client_hash;
_pay=id_ofpayment;
rc = hi.next();
if rc= 0 then h.remove(key:_id,key:_pay); *remove the payment just output;
end;
end;
end;
end;
keep id_client id_ofpayment id_commodity value;
run;
Among other things, this isn't terribly fast - I do a lot of iterating that might be wasteful. It will be relatively faster if you don't have any payment ID_client records that aren't represented in the charge records- any that you do are getting skipped over, so that could end up super slow.
I'm not confident hash is the superior solution, at least pre-9.2; keyed UPDATE might be superior. UPDATE is pretty much made for transactional database structures, which this seems close to.
Upvotes: 1