I want to combine 2 datasets of numbers to another dataset without a common variable

Question

I'm de-identifying some things. I have a dataset with my identifiers (ID1 and ID2) For each ID1 there are multiple ID2s I found the number of unique ID1s and created random numbers using ranuni and seed and output them into a dataset. I did the same for ID2. Now that I have ID1 and ID2 I want to create an excel file that will output ID1 ID2 de-identified ID1 and de-identified ID2. I was going to merge but I don't have anything to merge on (except observation). So I'm quite confused. Suggestions would be much appreciated.

EDIT: Example data set

data real_data;
  input ID1  ID2;
  datalines;    
1   11    
1   12    
1   13    
1   14    
1   15    
2   11   
3   13
3   14    
3   17
;;;;
run;

Joe · Accepted Answer

Side by side merge wouldn't work here because you have fewer ID1 than ID2. Unless you can easily make a dataset of the combination of id1 and id2, you can't really do this.

First off, I wouldn't do the deidentification this way. You can do it within the same datastep. Basically, create a random sorting parameter (two, one each for id1 and id2), then sort by that, then assign sequential IDs at that point. Sequential IDs applied to a randomly sorted dataset are de-identified by nature.

If you want to do it with a separate dataset, I would construct your random IDs in a format that converted a sequential 1:num(ID_1) to ID1_DEID and same for ID2. Then make a counter that increments by one for each ID1 and another for each ID2, and in each case PUT the counter with the format, generating the de_id value.

I want to combine 2 datasets of numbers to another dataset without a common variable

Answers (1)

Related Questions