Reputation: 67
I am trying to randomize a numerical vector in a data frame with R. My data looks something like this:
user click
1025 0
1025 1
1025 0
1025 0
1025 0
1025 0
1025 1
1025 0
1025 0
1025 0
1025 0
14639 1
14639 0
14639 0
14639 1
11605 0
11605 0
14605 1
In the data some users appear more often than other users. I now want to change each user ID. Let's say there are 100 unique user IDs. At the end, I want to have 100 different unique user IDs.
I tried dplyr:
data %>% group_by(user) %>% mutate(anon = rep(sample(length(unique(data$user)), 1, replace = F)), n())
However, that doesn't work because the sampling is done separately for each user; ignoring the other users. As a result, some users end up having the same new userID.
Can someone tell me how I can - at random - create a new user ID (that does not repeat) for each person in the data frame?
Upvotes: 1
Views: 69
Reputation: 389275
You can sample
random numbers from 1 to n
where n
is number of unique user
. You can then assign a random id
to each user.
set.seed(1234)
inds <- sample(length(unique(df$user)))
df$new_user <- inds[match(df$user, unique(df$user))]
df
# user click new_user
#1 1025 0 4
#2 1025 1 4
#3 1025 0 4
#4 1025 0 4
#5 1025 0 4
#6 1025 0 4
#7 1025 1 4
#8 1025 0 4
#9 1025 0 4
#10 1025 0 4
#11 1025 0 4
#12 14639 1 2
#13 14639 0 2
#14 14639 0 2
#15 14639 1 2
#16 11605 0 3
#17 11605 0 3
#18 14605 1 1
Upvotes: 0
Reputation: 320
I would solve this by first generating some user IDs, then creating a temporary tibble that associates existing with new user IDs, then joining your previous data with this table:
# Randomly generate some user IDs
new_user_ids = shuffle(seq(1, length(unique(df$user))))
# Join
data %>%
left_join(tibble(user = unique(df$user), new.user = new_user_ids)) %>%
mutate(user = new.user) %>% select(-new.user)
This gives the following result, for example:
user click
<int> <dbl>
1 3 0
2 3 1
3 3 0
4 3 0
5 3 0
6 3 0
7 3 1
8 3 0
9 3 0
10 3 0
11 3 0
12 2 1
13 2 0
14 2 0
15 2 1
16 4 0
17 4 0
18 1 1
Upvotes: 1