Reputation: 699
I'm de-identifying some things. I have a dataset with my identifiers (ID1 and ID2) For each ID1 there are multiple ID2s I found the number of unique ID1s and created random numbers using ranuni and seed and output them into a dataset. I did the same for ID2. Now that I have ID1 and ID2 I want to create an excel file that will output ID1 ID2 de-identified ID1 and de-identified ID2. I was going to merge but I don't have anything to merge on (except observation). So I'm quite confused. Suggestions would be much appreciated.
EDIT: Example data set
data real_data;
input ID1 ID2;
datalines;
1 11
1 12
1 13
1 14
1 15
2 11
3 13
3 14
3 17
;;;;
run;
Upvotes: 1
Views: 242
Reputation: 63424
Side by side merge wouldn't work here because you have fewer ID1 than ID2. Unless you can easily make a dataset of the combination of id1 and id2, you can't really do this.
First off, I wouldn't do the deidentification this way. You can do it within the same datastep. Basically, create a random sorting parameter (two, one each for id1 and id2), then sort by that, then assign sequential IDs at that point. Sequential IDs applied to a randomly sorted dataset are de-identified by nature.
If you want to do it with a separate dataset, I would construct your random IDs in a format that converted a sequential 1:num(ID_1) to ID1_DEID and same for ID2. Then make a counter that increments by one for each ID1 and another for each ID2, and in each case PUT the counter with the format, generating the de_id value.
Upvotes: 1