Kaikus
Kaikus

Reputation: 1095

Misunderstanding the use of 'apply'

I have a function:

myFun <- function (x, y)
{
}

It's intended to process a column of a dataframe

myFun(dataFrame$Column, anotherPrameterValue)

dataFrame$Column is a Factor with 4 levels. It's well recognized by the function and works great. I attach image of environment data from debugger (breakpoint inside the function, the first line)

enter image description here

It also works if passed by index:

myFun(dataFrame[1], anotherPrameterValue)

enter image description here

But, if I code:

apply(dataFrame, 2, myFun, y = anotherParameterValue)

The data passed to the function in 'x' is very different:

enter image description here

I suppose it must be something I'm not understanding in 'apply'...

If you need the code inside my function, tell me, but I think it's not neccesary, as the problem shows in the data received through parameters.

Upvotes: 0

Views: 44

Answers (1)

r2evans
r2evans

Reputation: 160952

As explained in the comments, apply is for objects of class matrix. R will happily/silently try to convert your frame input to a matrix while doing so.

A working example:

set.seed(42)
quux <- data.frame(int1=sample(1000,3), int2=sample(1000,3), num3=runif(3), num4=runif(3)) |>
  transform(fctr5 = factor(int1), chr6=as.character(int2))
quux
#   int1 int2      num3      num4 fctr5 chr6
# 1  561  153 0.7365883 0.7050648   561  153
# 2  997   74 0.1346666 0.4577418   997   74
# 3  321  228 0.6569923 0.7191123   321  228
myfun <- function(z, y = 0) y + mean(z)
myfun(quux$int2, 1000)
# [1] 1151.667
apply(quux, 2, myfun, y = 1000)
# Warning in mean.default(z) :
#   argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
#   argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
#   argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
#   argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
#   argument is not numeric or logical: returning NA
# Warning in mean.default(z) :
#   argument is not numeric or logical: returning NA
#  int1  int2  num3  num4 fctr5  chr6 
#    NA    NA    NA    NA    NA    NA 

If we debug myfun and step into what's going on, we'll immediately see a problem:

debug(myfun)
apply(quux, 2, myfun, y = 1000)
# debugging in: FUN(newX[, i], ...)
# debug at #1: y + mean(z)
y
# [1] 1000
z
# [1] "561" "997" "321"

You can continue through each call to myfun, once per column. You'll find that they are all class character.

It seems "obvious" that one cannot calculate something on the strings, and sometimes some math-operations can work on factors (not with mean) but they shouldn't (because depending on the function, it might work on the integer-encoding of the factor or the string-representations of the levels, very different things).

How do we fix this? Subset the frame so that you're only operating on the number-like columns.

isnum <- sapply(quux, is.numeric)
isnum
#  int1  int2  num3  num4 fctr5  chr6 
#  TRUE  TRUE  TRUE  TRUE FALSE FALSE 
apply(quux[,isnum], 2, myfun, y = 1000)
#     int1     int2     num3     num4 
# 1626.333 1151.667 1000.509 1000.627 

FYI, apply itself is not necessary, we can also use lapply or sapply here, depending on what you're planning on doing with the return value. For example, if you just need the averages as above, use

sapply(quux[,isnum], myfun, y = 1000)
#     int1     int2     num3     num4 
# 1626.333 1151.667 1000.509 1000.627 

But if you want to replace the frames values (for some reason ... work with me), one might do:

quux[isnum] <- lapply(quux[isnum], myfun, y = 1000)
quux
#       int1     int2     num3     num4 fctr5 chr6
# 1 1626.333 1151.667 1000.509 1000.627   561  153
# 2 1626.333 1151.667 1000.509 1000.627   997   74
# 3 1626.333 1151.667 1000.509 1000.627   321  228

Or if you wanted to append the columns to quux, then

# (starting with the original quux)
isnum_ch <- names(isnum)[isnum]
isnum_ch <- paste0(isnum_ch, "_new")
isnum_ch
# [1] "int1_new" "int2_new" "num3_new" "num4_new"
cbind(quux, setNames(lapply(quux[isnum], myfun, y = 500), isnum_ch))
#   int1 int2      num3      num4 fctr5 chr6 int1_new int2_new num3_new num4_new
# 1  561  153 0.7365883 0.7050648   561  153 1126.333 651.6667 500.5094 500.6273
# 2  997   74 0.1346666 0.4577418   997   74 1126.333 651.6667 500.5094 500.6273
# 3  321  228 0.6569923 0.7191123   321  228 1126.333 651.6667 500.5094 500.6273

Upvotes: 1

Related Questions