Reputation: 107
I create data by:
d <- data_frame(ID = rep(sample(500),each = 20))
I want to create a new column for each of 5 consecutive unique ID's. For this example it seems easy as the length of each ID is fixed. so simply:
d = d %>% mutate(new_col = rep(sample(100), each = 100))
gets consecutive 5 unique ID's. However I generate not fixed 20 ID's. I didn't add that part as it needs other long functions.
My question is simply after we have ID's, I want to take each of 5 consecutive unique ID's and create another column for each of these ID's. I believe group_by might be helpful, but I am not sure how to use it.
Upvotes: 1
Views: 39
Reputation: 215117
You might need this:
d <- d %>% mutate(new_col = cumsum(ID - lag(ID, default = first(ID)) != 0) %/% 5)
Basically, ID - lag(ID, default = first(ID)) != 0
evaluates to TRUE
whenever there is an ID change. Doing a cumsum
on the vector gives a rleid (take a look at this answer for more info) of the ID
column such as 0 0 0 1 1 1 2 2 2
. Since you want every five IDs to have the same ID in the new column, do a modular division by 5.
table(d$new_col)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
This should also work if IDs have different lengths.
Upvotes: 3