Reputation: 735
I have a bunch of different participant ID variables that I need to randomise for privacy purposes. However, I need the ID variables for grouping purposes in my analyses. So, I need new, 'random' IDs that are still unique to each participant.
An example of the result I want:
ids = c('A4579', 'A5219', 'A7832', 'A1650', 'A5219', 'A7832')
would become random_ids = c('5d6y8', 'u537h', '0v65j', 'o4g4h', 'u537h', '0v65j')
Notice that in both the ids
list and the new random_ids
list, the third and sixth and second and fifth IDs match.
What is the best (i.e., 'most random') way of creating these new 'randomised' IDs? I scare quote randomised because I am not sure how random these could really be...
Thanks in advance!
Upvotes: 1
Views: 32
Reputation: 887501
Here is an option with match
library(stringi)
library(dplyr)
stri_rand_strings(n_distinct(ids), nchar(ids[1]),
'[A-Z0-9]')[match(ids, unique(ids))]
#[1] "ZPRCV" "UTK2O" "QP6AN" "0HVLB" "UTK2O" "QP6AN"
|__________|_____________| |
| |
---------------------
Upvotes: 1