Randomize per group

Question

I am trying to randomize a numerical vector in a data frame with R. My data looks something like this:

In the data some users appear more often than other users. I now want to change each user ID. Let's say there are 100 unique user IDs. At the end, I want to have 100 different unique user IDs.

I tried dplyr:

data %>% group_by(user) %>% mutate(anon = rep(sample(length(unique(data$user)), 1, replace = F)), n())

However, that doesn't work because the sampling is done separately for each user; ignoring the other users. As a result, some users end up having the same new userID.

Can someone tell me how I can - at random - create a new user ID (that does not repeat) for each person in the data frame?

shizundeiku · Accepted Answer

I would solve this by first generating some user IDs, then creating a temporary tibble that associates existing with new user IDs, then joining your previous data with this table:

# Randomly generate some user IDs
new_user_ids = shuffle(seq(1, length(unique(df$user))))

# Join
data %>%
  left_join(tibble(user = unique(df$user), new.user = new_user_ids)) %>%
  mutate(user = new.user) %>% select(-new.user)

This gives the following result, for example:

    user click
    
 1     3     0
 2     3     1
 3     3     0
 4     3     0
 5     3     0
 6     3     0
 7     3     1
 8     3     0
 9     3     0
10     3     0
11     3     0
12     2     1
13     2     0
14     2     0
15     2     1
16     4     0
17     4     0
18     1     1

Randomize per group

Answers (2)

Related Questions