calculate frequency count of 2 columns simultaneously

Question

I am running some analyses involving multiple data frames. I want to merge the dataframes and identify the frequency that the person's ID and visit date that appear more than once. Below is some sample data:

df1:
id      visit
121-2   1998-04-05
121-2   1998-04-06
191-4   2015-05-01 
191-4   2015-05-02

df2:
id      visit
121-2   1998-04-05
183-4   2019-05-06
183-4   2019-05-07
8878-2  2015-12-04

I want this:

id      visit       freq 
121-2   1998-04-05   2
121-2   1998-04-06   1
191-4   2015-05-01   1
191-4   2015-05-02   1
183-4   2019-05-06   1
183-4   2019-05-07   1
8878-2  2015-12-04   1

I tried this code:

cbind.fill <- function(...){
  nm <- list(...) 
  nm <- lapply(nm, as.matrix)
  n <- max(sapply(nm, nrow)) 
  do.call(cbind, lapply(nm, function (x) 
    rbind(x, matrix(, n-nrow(x), ncol(x))))) 
}

a <- cbind.fill(df1,df2)
out <- as.data.frame(table(unlist(a)))
x <- out[order(-out$Freq),]

However, the issue is that I am getting a df that does not include "visit" and it appears to be ignoring visit so the # of times an ID appears is greater than it should be because visit is not being taken into account.

Can someone please help?

Ronak Shah · Accepted Answer

You can bind the two dataframes and count number of rows for each id and visit

library(dplyr)
df1 %>% bind_rows(df2) %>% count(id, visit)

#      id      visit n
#1  121-2 1998-04-05 2
#2  121-2 1998-04-06 1
#3  183-4 2019-05-06 1
#4  183-4 2019-05-07 1
#5  191-4 2015-05-01 1
#6  191-4 2015-05-02 1
#7 8878-2 2015-12-04 1

In base R you could do :

aggregate(id~id+visit, rbind(df1, df2), length)

calculate frequency count of 2 columns simultaneously

Answers (2)

Related Questions