py14ak
py14ak

Reputation: 31

Merging 2 data seta in what I understand is one-to many merge SAS

I have 2 data sets:

|ColA|ColB
|date1|1
|date2|2
|date3|3

|cola|colb|colc.....|colz|
|blaa|blaa|blaaaaaaa|date1
|blab|blab|blaaaaaab|date1
|blac|blac|blaaaaaac|date1
|blaa|blaa|blaaaaaaa|date2
|blag|blag|blaaaaaag|date3

What I want is for every date the number across it to equal that of the first table

|cola|colb|colc.....|colz |ColZ+1
|blaa|blaa|blaaaaaaa|date1|1
|blab|blab|blaaaaaab|date1|1
|blac|blac|blaaaaaac|date1|1
|blaa|blaa|blaaaaaaa|date2|2
|blag|blag|blaaaaaag|date3|3

What I've tried:

 proc sql;
create table merged_dpd_raw_2602_B
as select a.*, b.delnum 
from merged_dpd_raw_2602 a
left join work.date_transform b
on datepart(a.reporting_date) = datepart(b.DATE)
;quit;

It only adds a number to the last month. all other months have a "." .

proc sort data=date_transform;
by formatted_date;
run;
proc sort data=merged_dpd_raw_2602;
by formatted_date;
run;

data merged_dpd_raw_2602_B;
 merge date_transform(in=A) merged_dpd_raw_2602(in=B);
 by  formatted_date;
 if A=1 and B=1;

 run;

Correctly merges but I somehow loose 310k data entries.

Upvotes: 0

Views: 287

Answers (2)

Reeza
Reeza

Reputation: 21294

I suspect you're not trying to merge on actual dates but a month so you're using a formatted value most likely. So set your dates to the beginning of the month or convert them to YYMM to merge.

proc sql;
create table merged_dpd_raw_2602_B
as select a.*, b.delnum 
from merged_dpd_raw_2602 a
left join work.date_transform b
on intnx('month', datepart(a.reporting_date), 0, 'b') = intnx('month', datepart(b.DATE), 0, 'b')
;quit;

Upvotes: 2

Joe
Joe

Reputation: 63434

So the way I would debug this is the following.

Use the data step merge, for one - it's capable of handling many-to-one fine.

data merged_dpd_raw_2602_B nomatch;
 merge date_transform(in=__date) merged_dpd_raw_2602(in=__raw);
 by  formatted_date;
 if __date and __raw then output merged_dbd_raw_2602_b;
 else do;
    invar = ifn(__date,'in date','in raw ');
    output nomatch;
 end;
run;

That shows you the records that aren't matching. And importantly, the way the data step merge works, it might show you the records in the right order (but maybe not, depending on why the merge is not working).

The "in date" records you may not care as much about, it may just be the "in raw " variables that you care about - assuming I understand which one is the one with most of the records. Look at the records that don't match, and then try to figure out what they should've matched to (the record may be in there from "in date" but it might not be, if the "in date" record matched a different "in raw " record).

This lets you see exactly what is failing in your data.

Upvotes: 1

Related Questions