Reputation: 702
Yet another apply
question.
I've reviewed a lot of documentation on the apply
family of functions in R (and use them quite a bit in my work). I've defined a function myfun
below which I want to apply to every row of the dataframe inc
. I think I need some variant of apply(inc,1,myfun)
I've played around with it for a while, but still can't quite get it. I've included a loop which achieves exactly what I want to do... it's just super slow and inefficient on my real data which is considerably larger than the sample data I've included here.
I expect it's a quick fix, but I can't quite put my finger on it... maybe something with special argument ...
to apply?
English version of what the code below does: I want to look at all the Submit Dates in the inc
dataframe and find for each of these dates, how many rows in chg
there are where chg$Submit.Date
is within some range of the inc$Submit.Date
. Where the range is controlled by fdays
and bdays
in myfun
chgdf <- data.frame(Submit.Date=as.Date(c("2013-09-27", "2013-09-4", "2013-08-01", "2013-06-24", '2013-05-29', '2013-08-20')), ID=c('001', '001', '001', '001', '001', '005'), stringsAsFactors=F)
incdf <- data.frame(Submit.Date=as.Date(c("2013-10-19", "2013-09-14", "2013-08-22", '2013-08-20')), ID=c('001', '001', '002', '006'), stringsAsFactors=F)
myfun <- function(tdate, aid, chg=chgdf, inc=incdf, fdays=30, bdays=30) {
fdays <- tdate+fdays
bdays <- tdate-bdays
chg2 <- chg[chg$ID==aid & chg$Submit.Date<fdays & chg$Submit.Date>bdays, ]
ret <- nrow(chg2)
return(ret)
}
tdate <- inc[inc$ID==aid, 'Submit.Date'][1]
myfun(tdate, aid='001', bdays=50, fdays=100)
inc$chgw <- 0
for(i in 1:nrow(inc)){
aid <- inc$ID[i]
tdate <- inc$Submit.Date[i]
inc$chgw[i] <- myfun(tdate, aid, bdays=50, fdays=100)
}
Upvotes: 0
Views: 1194
Reputation: 52677
Similar to Julian's answer:
sapply(
split(incdf, 1:nrow(incdf)),
function(x) do.call(myfun, c(unname(x), bdays=50, fdays=100))
)
Here I don't use apply
because apply
will coerce the whole row to the same type, which may not be desirable. Note we need to unname(x)
because your df doesn't have the same column names as args to your function.
Upvotes: 2
Reputation: 8488
First, when you call apply
all values are coerced to strings, so you need to convert tdate
before using it. Otherwise you're trying to add days to a string:
tdate <- as.Date(tdate)
fdays <- tdate+fdays
bdays <- tdate-bdays
Second, you call apply(inc, 1, myfun)
. Note that in that case you're passing a single parameter to myfun
(the whole row), and not several parameters as myfun
is supposed to receive.
Solution 1: Change your function to receive a whole row of the dataframe and call as you did:
myfun <- function(row, chg=chgdf, inc=incdf, fdays=30, bdays=30) {
tdate <- as.Date(row[1])
fdays <- tdate+fdays
bdays <- tdate-bdays
chgdf2 <- chgdf[chgdf$ID==row[2] & chgdf$Submit.Date<fdays & chgdf$Submit.Date>bdays, ]
ret <- nrow(chgdf2)
return(ret)
}
> apply(inc, 1, myfun)
[1] 1 2 0 0
Solution 2: Call apply
using all parameters in the function call:
myfun <- function(tdate, aid, chg=chgdf, inc=incdf, fdays=30, bdays=30) {
fdays <- tdate+fdays
bdays <- tdate-bdays
chgdf2 <- chgdf[chgdf$ID==aid & chgdf$Submit.Date<fdays & chgdf$Submit.Date>bdays, ]
ret <- nrow(chgdf2)
return(ret)
}
> apply(inc, 1, function(row) myfun(as.Date(row[1]), row[2]))
[1] 1 2 0 0
I personally prefer the second solution, because it gives you the possibility to change the default values of your other parameters in myfun
:
> apply(inc, 1, function(row) myfun(as.Date(row[1]), row[2], bdays=50, fdays=50))
[1] 2 3 0 0
Upvotes: 1