John
John

Reputation: 47

Merging datasets based on 2 variables in SAS

I'm working with different databases. All of them contain information about 1000+ companies. A company is defined by its ticker code (the short version of the name (i.e. Ford as F) usually seen on stock quotation boards).

Aside from the ticker code to merge on I also have to merge on the time. I used month as a count variable throughout my time series. The final purpose is to have a regression in the kind of

Y(jt) = c + X(jt) +X1(jt) etc with j = company (ticker) and t = time (month).

So imagine I have 2 databases, one of which is the base database with variables such as Tickers, months, betas of a company (risk measure) etc. and a second database which has an extra variable (let's say market capitalisation).

What I want to do then is to merge these 2 databases based on the ticker and the month.

Example: Base database:

Ticker ____ Month ____ Betas

AA ____ 4 ____ 1.2

BB ____ 8 ____ 1.18

Second database:

Ticker ____ Month ____ MCAP

AA ____ 4 ____ 8542

BB ____ 6 ____ 1245

Then after merge I would like to have something like this:

Ticker ____ Month ____ Betas ____ MCAP

AA ____ 4 ____ 1.2 ____ 8542

So all observations that do not match BOTH the date and ticker have to be dropped. I'm sure this is possible, just can't find the right type of code.

PS: I'm guessing the underscores have something to do with font layout but both the bold as italic is supposed to be normal :)

Upvotes: 0

Views: 29263

Answers (4)

sunil
sunil

Reputation: 9

proc sort data=database1;
by ticker month;
run;

proc sort data=database2;
by ticker month;
run;
data gh;
merge database1(in=a) database2(in=b);
by ticker month;
if a and b;
run;

Upvotes: 0

lkbiostat
lkbiostat

Reputation: 46

Agree with Jonathan... after sorting both datasets independently by ticker and time, the data step of merging is what I would use..... little modification

data want; 
   merge base(in = b) mcap(in = m); 
   by ticker time;
   if m & b; 
run;

Records that don't have common ticker and time in both datasets would be dropped automatically..

Upvotes: 3

Jonathan Goldberg
Jonathan Goldberg

Reputation: 41

Calling the two datasets base and mcap, and assuming that they have both been sorted by ticker and month, you can do it this way:

data want;
  merge base(in = b)
        mcap(in = m);
  if m & b;
run;

The subsetting if will not accept any row that does not match in bath datasets.

Upvotes: 2

John
John

Reputation: 47

Ok so it appears you can just do it very easily by:

proc sort data=work;
by ticker month;
run;
proc sort data=wsize;
by ticker month;
run;
data test;
merge work(in=a) wsize(in=b);
by ticker month;
frommerg=a;
fromwtvol=b;
run;
data test;
set test;
if frommerg=0 then delete;
run;
data test;
set test;
if fromwtvol = 0 then delete;
run;
data test;
set test;
drop frommerg fromwtvol;
run;

That's the code I used, I tried this before posting because I didn't want to look like a leecher but it so happens that the 2 databases i tried had nothing in common (what are the odds with 70.000 observations :D), I retried it and it works (for now!)

Thanks anyway!

Upvotes: 0

Related Questions