Reputation: 2421

Count level within group_by hierarchy in dplyr

I have a large data set in R that is organised with multiple records from individual cases, nested within groups. A toy example is here:

d = data.frame(group = rep(c('control','patient'), each = 5), case = c('a', 'a', 'b', 'c', 'c', 'd','d','d','e','e'))

If in a dplyr chain, group_by(group, case) is applied, how can a column be created that numbers each row with the order of its case within the group? e.g. in the example below, in the third column, case 'a' is the first case in the control group, and case 'c' the third, but the numbering resets to 1 for case 'd', the first case in the patient group.

  group case  number
control  a    1
control  a    1
control  b    2
control  c    3
control  c    3
patient  d    1
patient  d    1
patient  d    1
patient  e    2
patient  e    2

I can see how this would be done by counting cases using a 'for' loop, but am wondering if there is a way to achieve this within a standard dplyr-style chain of operations?

Upvotes: 3

Answers (3)

r.user.05apr

Reputation: 5456

One solution would be:

library(dplyr)
library(tibble)

want<-left_join(d,
                d %>%
                  distinct(case) %>%
                  rownames_to_column(var="number") ,
                by="case")

# .. added later:
want2<-left_join(d,
                 bind_rows(
                   d %>%
                     filter(group=="control") %>%
                     distinct(case) %>%
                     rownames_to_column(var="number"),
                   d %>%
                     filter(group=="patient") %>%
                     distinct(case) %>%
                     rownames_to_column(var="number")),
                   by="case")

# I think this is less readable:
want3<-left_join(d,
                 bind_rows(by(d,d$group,function(x) x %>%
                                distinct(case) %>%
                                rownames_to_column(var="number"))),
                 by="case")

Upvotes: 1

akrun

Reputation: 887981

We can use data.table

library(data.table)
setDT(d)[, numbers := as.numeric(factor(case, levels = unique(case))), group]

Upvotes: 1

Adam Quek

Reputation: 7163

group_by(d, group) %>% 
   mutate(number= droplevels(case) %>% as.numeric)

Upvotes: 1

Count level within group_by hierarchy in dplyr

Answers (3)

Related Questions