coding_heart
coding_heart

Reputation: 1295

Applying function to data.frame generates NAs while applying it to columns works

I would like to apply a weight to a data frame in R that consists of variables that are numeric as well as factors. When I create a function that transforms factors into numerics and then weights the variable and apply this to any given column, it works well. However, when I apply it to the data.frame using the apply function, it generates NAs. For example:

set.seed(123)
frame <- data.frame(x = sample(1:100,10), y = c(rep("1",5), rep("2",5)))

weights <- 10
weight.fun <- function(x){
    x <- if(class(x) == "numeric" | class(x) == "integer"){x} else {as.numeric(levels(x))[x]}
    x*weights
}

weight.fun(frame$x)
# [1] 290 790 410 860 910  50 500 830 510 420
weight.fun(frame$y)
# [1] 10 10 10 10 10 20 20 20 20 20
apply(frame,2,weight.fun)
#        x  y
#  [1,] NA NA
#  [2,] NA NA
#  [3,] NA NA
#  [4,] NA NA
#  [5,] NA NA
#  [6,] NA NA
#  [7,] NA NA
#  [8,] NA NA
#  [9,] NA NA
# [10,] NA NA

Any idea on why this happens?

Upvotes: 0

Views: 96

Answers (2)

Jeffrey Evans
Jeffrey Evans

Reputation: 2397

It looks like the issues is in your function. Your if statement was returning NA's so, this was not an apply issue and if x is a character it will fail. Writing the function like this seems to work with apply.

set.seed(123)
frame <- data.frame(x = sample(1:100,10), y = c(rep("1",5), rep("2",5)))

weight.fun <- function(x, w = 10){ 
  if(!class(x) == "numeric" & !class(x) == "integer") {
    if(class(x) == "factor") { x <- as.numeric(as.character(x)) }
    else if(class(x) == "character") { x <- as.numeric(x) }  
  } 
  return(x * w)
}

apply(frame, MARGIN = 2, FUN = weight.fun) 

Upvotes: 0

josliber
josliber

Reputation: 44309

The operation will work as intended if you use sapply instead instead of apply:

sapply(frame, weight.fun)
#         x  y
#  [1,] 290 10
#  [2,] 790 10
#  [3,] 410 10
#  [4,] 860 10
#  [5,] 910 10
#  [6,]  50 20
#  [7,] 500 20
#  [8,] 830 20
#  [9,] 510 20
# [10,] 420 20

The reason for this discrepancy is that apply operates on matrices (or arrays). From ?apply:

Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix.

Therefore your data frame frame will be converted to a matrix when using apply, meaning the data types for all columns will be forced to be the same (strings in your case):

as.matrix(frame)
#        x    y  
#  [1,] "29" "1"
#  [2,] "79" "1"
#  [3,] "41" "1"
#  [4,] "86" "1"
#  [5,] "91" "1"
#  [6,] " 5" "2"
#  [7,] "50" "2"
#  [8,] "83" "2"
#  [9,] "51" "2"
# [10,] "42" "2"

This explains the unexpected behavior with apply -- weight.fun is getting passed character vectors.

Meanwhile, sapply operates over lists, which is just what you want because data frames are lists. Using sapply, the type of each column is preserved from the data frame, so weight.fun is first called with an integer vector and then it is called with a factor.

Upvotes: 4

Related Questions