Reputation: 83
I have following two data frames. df1
consists of an average size of a parameter for individual users. df2
contains the average daily value for the same parameter for users.
I want to count the number of days for the events where df2$size > df1$size
for each user
.
df1 = read.table(text='user size
AAL0706 29000
AAN0823 25000
AAV0450 30000', stringsAsFactors=FALSE, header=TRUE)
df2 = read.table(text='Date user size
2010-01-04 AAL0706 31054
2010-01-06 AAL0706 20703
2010-01-08 AAL0706 39968
2010-01-04 AAN0823 17892
2010-01-06 AAN0823 37839
2010-01-08 AAN0823 19649
2010-01-04 AAV0450 35432
2010-01-06 AAV0450 37839', stringsAsFactors=FALSE, header=TRUE)
Expected output is:
user count
AAL0706 2
AAN0823 1
AAV0450 2
I have tried to use the following command to calculate my results, but I realise something is wrong.
lapply(df1, function(y) {
ddply(df2$size, .(user), function(x) {
return(length(y$size(y$size > x$size))
})
})
Could you please advice me of an efficient way of doing this?
Upvotes: 2
Views: 120
Reputation: 118889
A slightly more straightforward solution using data.table would be to use the new non-equi
joins feature available in the current development version of data.table, v1.9.7.
require(data.table)
setDT(df2)[df1, .N, on=.(user, size > size), by=.EACHI]
Each row of df1
is matched against all rows of df2
based on the condition provided to the on
argument, i.e., match for exact value of user
and look for all rows where size
of df2
is greater within that user
.
Once the matching rows are obtained (for each row), the expression .N
(= count of matching rows) is evaluated for each row, because by = .EACHI
implies this. It instructs to execute the expression provided to the second argument j
to run for each i
(the first argument).
See installation instructions for devel version here.
Upvotes: 1
Reputation: 887971
We can do a left_join
from dplyr
, grouped by 'user' get the sum
of logical index (size.x > size.y
)
library(dplyr)
left_join(df2, df1, by = "user") %>%
group_by(user) %>%
summarise(Count = sum(size.x > size.y))
# user Count
# <chr> <int>
#1 AAL0706 2
#2 AAN0823 1
#3 AAV0450 2
Or using data.table
library(data.table)
setDT(df2)[df1, .(count = sum(size > i.size)),on = "user", by = .EACHI]
# user count
#1: AAL0706 2
#2: AAN0823 1
#3: AAV0450 2
Upvotes: 1