Sheri
Sheri

Reputation: 11

Lookup to match from one dataset to another

I need help multiplying the values from data set 2 to data set 1.

Data#1:

policy# Risk# Premium
KOK1 002 150
KOK2 003 130

Data#2:

Source policy# Risk# Item1
ageofbuild KOK1 002 3
yearofbuild KOK1 002 5
Discount1 KOK1 002 10%
Discount2 KOK1 002 5%
ageofbuild KOK2 003 4
yearofbuild KOK2 003 6
Discount1 KOK2 003 15%
Discount2 KOK2 003 5%

How to set up a formula to match the policy# and risk# from data#1 to data#2 and multiply the "premium" in data#1 with discount1 and discount2 from data#2 if they match the policy# and risk#?

Upvotes: 0

Views: 463

Answers (1)

Stu Sztukowski
Stu Sztukowski

Reputation: 12909

There are two ways to do this. The first way is a simple merge where you merge the dataset by policy and risk #, then perform your calculations. For example:

data want;
    merge data2(in=d2)
          data1(in=d1);
    by policy risk_nbr;

    /* If policy and risk_nbr match from data2 and data1, then calculate
       a premium */
    if(d2 AND d1 AND find(source, 'Discount') ) then value = Premium*Item1;
   
run;

This is similar to a full join on policy, risk_nbr in SQL, but only multiplying if the two key values match. Note that both datasets must be sorted by policy and risk_nbr for this to work.

The second way is through a hash table lookup, which is one of my favorite ways of doing these small lookup tables. They're really fast.

Think of a hash table as an independent table that floats out in memory. We're going to talk to it using special methods that look up a value in the hash table by a key in our dataset and pull that value down so we can use it. Here's what that looks like.

data want;

    /* Only load the hash table once */
    if(_N_ = 1) then do;
        dcl hash h(dataset: 'data2');      *Add a dataset to a hash table called 'h';
            h.defineKey('policy', 'risk'); *Define our lookup key;
            h.defineData('premium');       *The value we want to pull;
        h.defineDone();                    *Load the dataset into `h`;
      
        /* Initialize the numeric variable 'premium' with a missing value 
           since it does not exist yet. This prevents data step warnings. */
        call missing(premium); 
    end;
    
    /* Look up the value of policy and risk in the set dataset and compare it 
       with the hash table's value of policy and risk.
       If there is a match, rc = 0
    */
    rc = h.Find();

    if(rc = 0 AND find(source, 'Discount') ) then value = Premium*Item1;

    drop rc;
run;

Hash tables are extremely powerful and very fast, especially if you are joining a small table with a large table. You don't need to do any pre-sorting, either.

If you want to learn more about hash tables, check out the paper I cut my processing time by 90% using hash tables - you can too!

Upvotes: 1

Related Questions