Reputation: 2421
I have a large data set in R that is organised with multiple records from individual cases, nested within groups. A toy example is here:
d = data.frame(group = rep(c('control','patient'), each = 5), case = c('a', 'a', 'b', 'c', 'c', 'd','d','d','e','e'))
If in a dplyr chain, group_by(group, case)
is applied, how can a column be created that numbers each row with the order of its case within the group? e.g. in the example below, in the third column, case 'a' is the first case in the control group, and case 'c' the third, but the numbering resets to 1 for case 'd', the first case in the patient group.
group case number
control a 1
control a 1
control b 2
control c 3
control c 3
patient d 1
patient d 1
patient d 1
patient e 2
patient e 2
I can see how this would be done by counting cases using a 'for' loop, but am wondering if there is a way to achieve this within a standard dplyr-style chain of operations?
Upvotes: 3
Views: 1165
Reputation: 5456
One solution would be:
library(dplyr)
library(tibble)
want<-left_join(d,
d %>%
distinct(case) %>%
rownames_to_column(var="number") ,
by="case")
# .. added later:
want2<-left_join(d,
bind_rows(
d %>%
filter(group=="control") %>%
distinct(case) %>%
rownames_to_column(var="number"),
d %>%
filter(group=="patient") %>%
distinct(case) %>%
rownames_to_column(var="number")),
by="case")
# I think this is less readable:
want3<-left_join(d,
bind_rows(by(d,d$group,function(x) x %>%
distinct(case) %>%
rownames_to_column(var="number"))),
by="case")
Upvotes: 1
Reputation: 887251
We can use data.table
library(data.table)
setDT(d)[, numbers := as.numeric(factor(case, levels = unique(case))), group]
Upvotes: 1
Reputation: 7153
group_by(d, group) %>%
mutate(number= droplevels(case) %>% as.numeric)
Upvotes: 1