Comparing two dataframes based on a condition

Question

I have following two data frames. df1 consists of an average size of a parameter for individual users. df2 contains the average daily value for the same parameter for users.

I want to count the number of days for the events where df2$size > df1$size for each user.

df1 = read.table(text='user  size
AAL0706 29000
AAN0823 25000
AAV0450 30000', stringsAsFactors=FALSE, header=TRUE)


df2 = read.table(text='Date    user    size
      2010-01-04 AAL0706 31054
      2010-01-06 AAL0706 20703
      2010-01-08 AAL0706 39968
      2010-01-04 AAN0823 17892
      2010-01-06 AAN0823 37839
      2010-01-08 AAN0823 19649
      2010-01-04 AAV0450 35432
      2010-01-06 AAV0450 37839', stringsAsFactors=FALSE, header=TRUE)

Expected output is:

   user  count
AAL0706      2
AAN0823      1
AAV0450      2

I have tried to use the following command to calculate my results, but I realise something is wrong.

lapply(df1, function(y) { 
    ddply(df2$size, .(user), function(x) { 
        return(length(y$size(y$size > x$size))
    })
})

Could you please advice me of an efficient way of doing this?

akrun · Accepted Answer

We can do a left_join from dplyr, grouped by 'user' get the sum of logical index (size.x > size.y)

library(dplyr)
left_join(df2, df1, by = "user") %>% 
            group_by(user) %>% 
            summarise(Count = sum(size.x > size.y))
#       user Count
#      
#1 AAL0706     2
#2 AAN0823     1
#3 AAV0450     2

Or using data.table

library(data.table)
setDT(df2)[df1, .(count = sum(size > i.size)),on = "user", by = .EACHI]
#      user count
#1: AAL0706     2
#2: AAN0823     1
#3: AAV0450     2

Comparing two dataframes based on a condition

Answers (2)

Related Questions