Michael MacAskill
Michael MacAskill

Reputation: 2421

Count level within group_by hierarchy in dplyr

I have a large data set in R that is organised with multiple records from individual cases, nested within groups. A toy example is here:

d = data.frame(group = rep(c('control','patient'), each = 5), case = c('a', 'a', 'b', 'c', 'c', 'd','d','d','e','e'))

If in a dplyr chain, group_by(group, case) is applied, how can a column be created that numbers each row with the order of its case within the group? e.g. in the example below, in the third column, case 'a' is the first case in the control group, and case 'c' the third, but the numbering resets to 1 for case 'd', the first case in the patient group.

  group case  number
control  a    1
control  a    1
control  b    2
control  c    3
control  c    3
patient  d    1
patient  d    1
patient  d    1
patient  e    2
patient  e    2

I can see how this would be done by counting cases using a 'for' loop, but am wondering if there is a way to achieve this within a standard dplyr-style chain of operations?

Upvotes: 3

Views: 1165

Answers (3)

r.user.05apr
r.user.05apr

Reputation: 5456

One solution would be:

library(dplyr)
library(tibble)

want<-left_join(d,
                d %>%
                  distinct(case) %>%
                  rownames_to_column(var="number") ,
                by="case")

# .. added later:
want2<-left_join(d,
                 bind_rows(
                   d %>%
                     filter(group=="control") %>%
                     distinct(case) %>%
                     rownames_to_column(var="number"),
                   d %>%
                     filter(group=="patient") %>%
                     distinct(case) %>%
                     rownames_to_column(var="number")),
                   by="case")

# I think this is less readable:
want3<-left_join(d,
                 bind_rows(by(d,d$group,function(x) x %>%
                                distinct(case) %>%
                                rownames_to_column(var="number"))),
                 by="case")

Upvotes: 1

akrun
akrun

Reputation: 887251

We can use data.table

library(data.table)
setDT(d)[, numbers := as.numeric(factor(case, levels = unique(case))), group]

Upvotes: 1

Adam Quek
Adam Quek

Reputation: 7153

group_by(d, group) %>% 
   mutate(number= droplevels(case) %>% as.numeric)

Upvotes: 1

Related Questions