Qbik
Qbik

Reputation: 6147

Hash object in SAS - is it possible to merge two tables below using hash object?

Is it possible to merge below two tables using hash object in SAS 9.1 example below ? The main problemseems to be creation of Value variable w Result dataset. Problem is that each payment could pay for more than one charge, and sometimes more than one payment is need to pay for one charge and this tho cases could appear simultaneously. Does it problem has some general name ? http://support.sas.com/rnd/base/datastep/dot/hash-getting-started.pdf

data TABLE1;
input ID_client   ID_commodity    Charge;
datalines;
1             111111111      100
1             222222222      200
2             333333333      300    
2             444444444      400
2             555555555      500
;;;;
run;


data TABLE2;
input ID_client_hash     ID_ofpayment  paymentValue;
datalines;
1             11              50    
1             12              50    
1             13              100   
1             14              50    
1             15              50    
2             21              500   
2             22              200   
2             23              100   
2             24              200   
2             25              200
;;;;
run;

data OUT;
input ID_client     ID_commodity    ID_ofpayment    value;
datalines;
1               111111111             11    50
1               111111111             12    50
1               222222222             13    100
1               222222222             14    50
1               222222222             15    50
2               333333333             21    300
2               444444444             21    200
2               444444444             22    200
2               555555555             23    100
2               555555555             24    200
2               555555555             25    200

Upvotes: 0

Views: 752

Answers (1)

Joe
Joe

Reputation: 63424

This might work for you - I have 9.2 and 9.2 has some significant hash improvements, but I think I behaved myself and only used what was there in 9.1. You might try crossposting this to SAS-L [SAS listserv] as Paul Dorfman (ie, The Hash Guru) reads that still I believe.

I assumed you want the 'leftovers' posted out. You may need to work on that part, if it's not working the way you want. This isn't terribly well tested, it works for your example dataset. I call missing the commodity for 24 and 25 since they're not used for that.

I'm pretty sure there's a more clean way to do the iteration than what I do, but since 9.2+ is what I use and we have multidata available, i've always used that instead of hash iterators so I don't know the cleaner methods.

data have;
input ID_client   ID_commodity    Charge;
datalines;
1             111111111      100
1             222222222      200
2             333333333      300    
2             444444444      400
2             555555555      50
;;;;
run;


data for_hash;
input ID_client_hash     ID_ofpayment  paymentValue;
datalines;
1             11              50    
1             12              50    
1             13              100   
1             14              50    
1             15              50    
2             21              500   
2             22              200   
2             23              100   
2             24              200   
2             25              200
;;;;
run;

data want;
*Create hash and hash iterator - must use iterator since 9.1 does not allow multidata option;
if _n_ = 1 then do;
  format id_client_hash paymentValue id_ofpayment BEST12.;
  declare hash h(dataset:'for_hash' , ordered: 'a');
  h.defineKey('ID_client_hash','id_ofpayment'); *note I put id_client_hash, renaming the id - want to be able to compare them;
  h.defineData('id_client_hash','id_ofpayment','paymentValue');
  call missing(id_ofpayment,paymentValue, id_client_hash);
  h.defineDone();
  declare hiter hi('h');
end;

do _t = 1 by 1 until (last.id_client);
 set have;
 by id_client;

 *Iterate through the hash and find the first record with the same ID_client;
 do rc = hi.first() by 0 while (rc eq 0 and ID_client ne ID_client_hash);
   rc = hi.next();
 end;

 *For the current charge record, iterate through the payment (hash) until all paid up.;
 do while (charge gt 0 and rc eq 0 and ID_client=ID_client_hash);
   if charge ge paymentValue then do; *If charge >= paymentvalue, use up the payment value;
     value = paymentValue; *so whole paymentValue is value;
     charge = charge - paymentValue; *charge is decremented by paymentValue;
     output; *output row;
     _id=ID_client_hash; 
     _pay=id_ofpayment;
     rc = hi.next();
    h.remove(key:_id,key:_pay); *remove payment row from hash now that it has been used up;
   end;
   else do; *this is if (remaining) charge is less than payment - we will not use all of the payment;
     value = charge; *value is the remainder of the charge, ie, how much of payment was actually used;
     paymentValue = paymentValue - charge; *paymentValue is the remainder of paymentValue;
     charge= 0; *charge is zero now;
     output; *output a row;
     h.replace(); *replace paymentValue in the hash with the new value of paymentValue, minus charge;
   end;
 end; *end of iteration through hash - at this point, either charge = 0 or we have run out of payments with that ID;
 if charge gt 0 then do;
   value=-1*charge;
   call missing(id_ofpayment);
   output; *output a row for the charge, which is not paid; 
 end;
 if last.id_client then do;  *this is cleanup, checking to see if we have any leftover payments;
   do while (rc=0); *iterate through the remaining hash;
     do rc = hi.first() by 0 while (rc eq 0 and ID_client ne ID_client_hash);
       rc = hi.next();
     end;
     if rc=0 then do;
         call missing(id_commodity); *to make it clear this is a leftover payment;
         value=paymentValue; *update the value;
         output; *output the payment;
         _id=ID_client_hash;
         _pay=id_ofpayment;
         rc = hi.next();
         if rc= 0 then h.remove(key:_id,key:_pay); *remove the payment just output;
     end;    
   end;
 end;
end;
keep id_client id_ofpayment id_commodity value;
run;

Among other things, this isn't terribly fast - I do a lot of iterating that might be wasteful. It will be relatively faster if you don't have any payment ID_client records that aren't represented in the charge records- any that you do are getting skipped over, so that could end up super slow.

I'm not confident hash is the superior solution, at least pre-9.2; keyed UPDATE might be superior. UPDATE is pretty much made for transactional database structures, which this seems close to.

Upvotes: 1

Related Questions