Andrew Taylor
Andrew Taylor

Reputation: 3488

Summarize (count/freq) by treatment type where individuals could receive both treatments

Say we have this data:

dat<-data.frame(id=c(1,1,2,2,3,4,4,5,6,6),Rx=c(1,2,1,2,1,1,1,2,2,2))

   id Rx
1   1  1
2   1  2
3   2  1
4   2  2
5   3  1
6   4  1
7   4  1
8   5  2
9   6  2
10  6  2

Where Id is the subject id, and Rx is the treatment they received. So, there are repeated observations and the treatment may or may not be consistent per subject.

I want to be able to summarize how many subjects only received Rx 1, only received Rx 2, and how many received Rx 1 and 2.

I'd prefer a dplyr solution, but data.table and base R would be fine too. I thought something like:

dat %>%
  group_by(id,Rx) %>%
  unique() %>%
  ...something

The end result should be something like:

  Rx    Count
   1        2
   2        2
Both        2

Thanks!

Upvotes: 8

Views: 304

Answers (3)

David Arenburg
David Arenburg

Reputation: 92292

Here's another generalized solution

library(dplyr)
dat %>%
  group_by(id) %>%
  summarise(indx = toString(sort(unique(Rx)))) %>%
  ungroup() %>%
  count(indx)

# Source: local data table [3 x 2]
# 
#   indx n
# 1 1, 2 2
# 2    1 2
# 3    2 2

With data.table, similarly

library(data.table)
setDT(dat)[, .(indx = toString(sort(unique(Rx)))), id][ , .N, indx]

Upvotes: 5

davechilders
davechilders

Reputation: 9123

This solution does not generalize well to more than 2 treatments:

library(dplyr)

dat %>%
  distinct(id, Rx) %>%
  group_by(id) %>%
  mutate(
    trt1 = setequal(1, Rx), # change due to comment from @Marat Talipov
    trt2 = setequal(2, Rx),
    both = setequal(1:2, Rx)
    ) %>%
  ungroup() %>%
  distinct(id) %>%
  summarise_each(funs(sum), trt1:both)

This solution is shorter and does generalize to more than one treatment:

library(stringr)

dat %>%
  group_by(id) %>%
  mutate(
    rx_list = str_c(sort(unique(Rx)), collapse = ",")
    ) %>%
  distinct(id) %>%
  count(rx_list)

Upvotes: 3

nicola
nicola

Reputation: 24480

Not exactly the output you have indicated, but it's base R, one-liner and general:

 table(do.call(function(...) paste(...,sep="_"),as.data.frame(table(dat)>0)))
 #FALSE_TRUE TRUE_FALSE  TRUE_TRUE 
     2          2          2

If the treatments are more then two, you have indicated all the possible combinations.

Upvotes: 2

Related Questions