Reputation: 635
I am running some analyses involving multiple data frames. I want to merge the dataframes and identify the frequency that the person's ID and visit date that appear more than once. Below is some sample data:
df1:
id visit
121-2 1998-04-05
121-2 1998-04-06
191-4 2015-05-01
191-4 2015-05-02
df2:
id visit
121-2 1998-04-05
183-4 2019-05-06
183-4 2019-05-07
8878-2 2015-12-04
I want this:
id visit freq
121-2 1998-04-05 2
121-2 1998-04-06 1
191-4 2015-05-01 1
191-4 2015-05-02 1
183-4 2019-05-06 1
183-4 2019-05-07 1
8878-2 2015-12-04 1
I tried this code:
cbind.fill <- function(...){
nm <- list(...)
nm <- lapply(nm, as.matrix)
n <- max(sapply(nm, nrow))
do.call(cbind, lapply(nm, function (x)
rbind(x, matrix(, n-nrow(x), ncol(x)))))
}
a <- cbind.fill(df1,df2)
out <- as.data.frame(table(unlist(a)))
x <- out[order(-out$Freq),]
However, the issue is that I am getting a df that does not include "visit" and it appears to be ignoring visit so the # of times an ID appears is greater than it should be because visit is not being taken into account.
Can someone please help?
Upvotes: 0
Views: 37
Reputation: 887118
We can use rbindlist
library(data.table)
rbindlist(list(dt1, df2))[, .N, .(id, visit)]
Upvotes: 0
Reputation: 388982
You can bind the two dataframes and count number of rows for each id
and visit
library(dplyr)
df1 %>% bind_rows(df2) %>% count(id, visit)
# id visit n
#1 121-2 1998-04-05 2
#2 121-2 1998-04-06 1
#3 183-4 2019-05-06 1
#4 183-4 2019-05-07 1
#5 191-4 2015-05-01 1
#6 191-4 2015-05-02 1
#7 8878-2 2015-12-04 1
In base R you could do :
aggregate(id~id+visit, rbind(df1, df2), length)
Upvotes: 1