Reputation: 1095
I have a function:
myFun <- function (x, y)
{
}
It's intended to process a column of a dataframe
myFun(dataFrame$Column, anotherPrameterValue)
dataFrame$Column is a Factor with 4 levels. It's well recognized by the function and works great. I attach image of environment data from debugger (breakpoint inside the function, the first line)
It also works if passed by index:
myFun(dataFrame[1], anotherPrameterValue)
But, if I code:
apply(dataFrame, 2, myFun, y = anotherParameterValue)
The data passed to the function in 'x' is very different:
I suppose it must be something I'm not understanding in 'apply'...
If you need the code inside my function, tell me, but I think it's not neccesary, as the problem shows in the data received through parameters.
Upvotes: 0
Views: 44
Reputation: 160952
As explained in the comments, apply
is for objects of class matrix
. R will happily/silently try to convert your frame input to a matrix while doing so.
A working example:
set.seed(42)
quux <- data.frame(int1=sample(1000,3), int2=sample(1000,3), num3=runif(3), num4=runif(3)) |>
transform(fctr5 = factor(int1), chr6=as.character(int2))
quux
# int1 int2 num3 num4 fctr5 chr6
# 1 561 153 0.7365883 0.7050648 561 153
# 2 997 74 0.1346666 0.4577418 997 74
# 3 321 228 0.6569923 0.7191123 321 228
myfun <- function(z, y = 0) y + mean(z)
myfun(quux$int2, 1000)
# [1] 1151.667
apply(quux, 2, myfun, y = 1000)
# Warning in mean.default(z) :
# argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
# argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
# argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
# argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
# argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
# argument is not numeric or logical: returning NA
# int1 int2 num3 num4 fctr5 chr6
# NA NA NA NA NA NA
If we debug myfun
and step into what's going on, we'll immediately see a problem:
debug(myfun)
apply(quux, 2, myfun, y = 1000)
# debugging in: FUN(newX[, i], ...)
# debug at #1: y + mean(z)
y
# [1] 1000
z
# [1] "561" "997" "321"
You can c
ontinue through each call to myfun
, once per column. You'll find that they are all class character
.
It seems "obvious" that one cannot calculate something on the strings, and sometimes some math-operations can work on factor
s (not with mean
) but they shouldn't (because depending on the function, it might work on the integer
-encoding of the factor or the string-representations of the levels, very different things).
How do we fix this? Subset the frame so that you're only operating on the number-like columns.
isnum <- sapply(quux, is.numeric)
isnum
# int1 int2 num3 num4 fctr5 chr6
# TRUE TRUE TRUE TRUE FALSE FALSE
apply(quux[,isnum], 2, myfun, y = 1000)
# int1 int2 num3 num4
# 1626.333 1151.667 1000.509 1000.627
FYI, apply
itself is not necessary, we can also use lapply
or sapply
here, depending on what you're planning on doing with the return value. For example, if you just need the averages as above, use
sapply(quux[,isnum], myfun, y = 1000)
# int1 int2 num3 num4
# 1626.333 1151.667 1000.509 1000.627
But if you want to replace the frames values (for some reason ... work with me), one might do:
quux[isnum] <- lapply(quux[isnum], myfun, y = 1000)
quux
# int1 int2 num3 num4 fctr5 chr6
# 1 1626.333 1151.667 1000.509 1000.627 561 153
# 2 1626.333 1151.667 1000.509 1000.627 997 74
# 3 1626.333 1151.667 1000.509 1000.627 321 228
Or if you wanted to append the columns to quux
, then
# (starting with the original quux)
isnum_ch <- names(isnum)[isnum]
isnum_ch <- paste0(isnum_ch, "_new")
isnum_ch
# [1] "int1_new" "int2_new" "num3_new" "num4_new"
cbind(quux, setNames(lapply(quux[isnum], myfun, y = 500), isnum_ch))
# int1 int2 num3 num4 fctr5 chr6 int1_new int2_new num3_new num4_new
# 1 561 153 0.7365883 0.7050648 561 153 1126.333 651.6667 500.5094 500.6273
# 2 997 74 0.1346666 0.4577418 997 74 1126.333 651.6667 500.5094 500.6273
# 3 321 228 0.6569923 0.7191123 321 228 1126.333 651.6667 500.5094 500.6273
Upvotes: 1