question
question

Reputation: 13

conditional row indexing in r

I have a large dataset and I want to create a new column that sets a value based on a condition. Here is an example:

x <- tibble(
  x1 = c(rep("a", 3), rep("a1", 3), rep("a2", 3))
)

I would like a new column that that identifies all of the same values of column one. The end result should look like the following:

x <- tibble(
  x1 = c(rep("a", 3), rep("a1", 3), rep("a2", 3)),
  x2 = c(rep(1, 3), rep(2, 3), rep(3, 3))
)

Is there an easy way to do this? Maybe in dplyr? Thanks for the help.

Upvotes: 1

Views: 140

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 101064

A data.table option using .GRP

> setDT(x)[, x2 := .GRP, x1][]
   x1 x2
1:  a  1
2:  a  1
3:  a  1
4: a1  2
5: a1  2
6: a1  2
7: a2  3
8: a2  3
9: a2  3

or rleid (thank @akrun's comment)

> setDT(x)[, x2 := rleid(x1)][]
   x1 x2
1:  a  1
2:  a  1
3:  a  1
4: a1  2
5: a1  2
6: a1  2
7: a2  3
8: a2  3
9: a2  3

Upvotes: 1

akrun
akrun

Reputation: 886948

We can use match

library(dplyr)
x <- x %>% 
         mutate(x2 = match(x1, unique(x1)))

Or do a grouping and get the group index with cur_group_id

x <- x %>% 
        group_by(x1) %>%
        mutate(x2 = cur_group_id()) %>% 
        ungroup

Upvotes: 0

Related Questions