Reputation: 13
I have a large dataset and I want to create a new column that sets a value based on a condition. Here is an example:
x <- tibble(
x1 = c(rep("a", 3), rep("a1", 3), rep("a2", 3))
)
I would like a new column that that identifies all of the same values of column one. The end result should look like the following:
x <- tibble(
x1 = c(rep("a", 3), rep("a1", 3), rep("a2", 3)),
x2 = c(rep(1, 3), rep(2, 3), rep(3, 3))
)
Is there an easy way to do this? Maybe in dplyr? Thanks for the help.
Upvotes: 1
Views: 140
Reputation: 101064
A data.table
option using .GRP
> setDT(x)[, x2 := .GRP, x1][]
x1 x2
1: a 1
2: a 1
3: a 1
4: a1 2
5: a1 2
6: a1 2
7: a2 3
8: a2 3
9: a2 3
or rleid
(thank @akrun's comment)
> setDT(x)[, x2 := rleid(x1)][]
x1 x2
1: a 1
2: a 1
3: a 1
4: a1 2
5: a1 2
6: a1 2
7: a2 3
8: a2 3
9: a2 3
Upvotes: 1
Reputation: 886948
We can use match
library(dplyr)
x <- x %>%
mutate(x2 = match(x1, unique(x1)))
Or do a grouping and get the group index with cur_group_id
x <- x %>%
group_by(x1) %>%
mutate(x2 = cur_group_id()) %>%
ungroup
Upvotes: 0