Reputation: 11
I need help multiplying the values from data set 2 to data set 1.
Data#1:
policy# | Risk# | Premium |
---|---|---|
KOK1 | 002 | 150 |
KOK2 | 003 | 130 |
Data#2:
Source | policy# | Risk# | Item1 |
---|---|---|---|
ageofbuild | KOK1 | 002 | 3 |
yearofbuild | KOK1 | 002 | 5 |
Discount1 | KOK1 | 002 | 10% |
Discount2 | KOK1 | 002 | 5% |
ageofbuild | KOK2 | 003 | 4 |
yearofbuild | KOK2 | 003 | 6 |
Discount1 | KOK2 | 003 | 15% |
Discount2 | KOK2 | 003 | 5% |
How to set up a formula to match the policy# and risk# from data#1 to data#2 and multiply the "premium" in data#1 with discount1 and discount2 from data#2 if they match the policy# and risk#?
Upvotes: 0
Views: 463
Reputation: 12909
There are two ways to do this. The first way is a simple merge where you merge the dataset by policy and risk #, then perform your calculations. For example:
data want;
merge data2(in=d2)
data1(in=d1);
by policy risk_nbr;
/* If policy and risk_nbr match from data2 and data1, then calculate
a premium */
if(d2 AND d1 AND find(source, 'Discount') ) then value = Premium*Item1;
run;
This is similar to a full join on policy, risk_nbr
in SQL, but only multiplying if the two key values match. Note that both datasets must be sorted by policy
and risk_nbr
for this to work.
The second way is through a hash table lookup, which is one of my favorite ways of doing these small lookup tables. They're really fast.
Think of a hash table as an independent table that floats out in memory. We're going to talk to it using special methods that look up a value in the hash table by a key in our dataset and pull that value down so we can use it. Here's what that looks like.
data want;
/* Only load the hash table once */
if(_N_ = 1) then do;
dcl hash h(dataset: 'data2'); *Add a dataset to a hash table called 'h';
h.defineKey('policy', 'risk'); *Define our lookup key;
h.defineData('premium'); *The value we want to pull;
h.defineDone(); *Load the dataset into `h`;
/* Initialize the numeric variable 'premium' with a missing value
since it does not exist yet. This prevents data step warnings. */
call missing(premium);
end;
/* Look up the value of policy and risk in the set dataset and compare it
with the hash table's value of policy and risk.
If there is a match, rc = 0
*/
rc = h.Find();
if(rc = 0 AND find(source, 'Discount') ) then value = Premium*Item1;
drop rc;
run;
Hash tables are extremely powerful and very fast, especially if you are joining a small table with a large table. You don't need to do any pre-sorting, either.
If you want to learn more about hash tables, check out the paper I cut my processing time by 90% using hash tables - you can too!
Upvotes: 1