Tom O
Tom O

Reputation: 1597

Count occurence across multiple columns using R & dplyr

This should be a simple solution...I just can't wrap my head around this. I'd like to count the occurrences of a factor across multiple columns of a data frame. There're 13 columns range from abx.1 > abx.13 and a huge number of rows.

Sample data frame:

library(dplyr)
 abx.1 <- c('Amoxil', 'Cipro', 'Moxiflox', 'Pip-tazo')
 start.1 <- c('2012-01-01', '2012-02-01', '2013-01-01', '2014-01-01')
 abx.2 <- c('Pip-tazo', 'Ampicillin', 'Amoxil', NA)
 start.2 <- c('2012-01-01', '2012-02-01', '2013-01-01', NA)
 abx.3 <- c('Ampicillin', 'Amoxil', NA, NA)
 start.3 <- c('2012-01-01', '2012-02-01', NA,NA)
 worksheet <-data.frame (abx.1, start.1, abx.2, start.2, abx.3, start.3)

Result I'd like:

  1. name count
  2. Amoxil 3
  3. Ampicillin 2
  4. Pip-tazo 2
  5. Cipro 1
  6. Moxiflox 1

I've tried :

worksheet %>% group_by (abx.1, abx.2, abx.3) %>% summarise(count = n())

This doesn't give me my desired output. Any thoughts would be greatly appreciated.

Upvotes: 4

Views: 6268

Answers (1)

David Arenburg
David Arenburg

Reputation: 92300

If you want a dplyr solution, I'd suggest combining it with tidyr in order to convert your data to a long format first

library(tidyr)
worksheet %>%
  select(starts_with("abx")) %>%
  gather(key, value, na.rm = TRUE) %>%
  count(value)

# Source: local data frame [5 x 2]
# 
#        value n
# 1     Amoxil 3
# 2 Ampicillin 2
# 3      Cipro 1
# 4   Moxiflox 1
# 5   Pip-tazo 2

Alternatively, with base R, it's just

as.data.frame(table(unlist(worksheet[grep("^abx", names(worksheet))])))
#         Var1 Freq
# 1     Amoxil    3
# 2      Cipro    1
# 3   Moxiflox    1
# 4   Pip-tazo    2
# 5 Ampicillin    2

Upvotes: 7

Related Questions