Reputation: 1295
I would like to apply a weight to a data frame in R that consists of variables that are numeric as well as factors. When I create a function that transforms factors into numerics and then weights the variable and apply this to any given column, it works well. However, when I apply it to the data.frame using the apply
function, it generates NAs. For example:
set.seed(123)
frame <- data.frame(x = sample(1:100,10), y = c(rep("1",5), rep("2",5)))
weights <- 10
weight.fun <- function(x){
x <- if(class(x) == "numeric" | class(x) == "integer"){x} else {as.numeric(levels(x))[x]}
x*weights
}
weight.fun(frame$x)
# [1] 290 790 410 860 910 50 500 830 510 420
weight.fun(frame$y)
# [1] 10 10 10 10 10 20 20 20 20 20
apply(frame,2,weight.fun)
# x y
# [1,] NA NA
# [2,] NA NA
# [3,] NA NA
# [4,] NA NA
# [5,] NA NA
# [6,] NA NA
# [7,] NA NA
# [8,] NA NA
# [9,] NA NA
# [10,] NA NA
Any idea on why this happens?
Upvotes: 0
Views: 96
Reputation: 2397
It looks like the issues is in your function. Your if statement was returning NA's so, this was not an apply issue and if x is a character it will fail. Writing the function like this seems to work with apply.
set.seed(123)
frame <- data.frame(x = sample(1:100,10), y = c(rep("1",5), rep("2",5)))
weight.fun <- function(x, w = 10){
if(!class(x) == "numeric" & !class(x) == "integer") {
if(class(x) == "factor") { x <- as.numeric(as.character(x)) }
else if(class(x) == "character") { x <- as.numeric(x) }
}
return(x * w)
}
apply(frame, MARGIN = 2, FUN = weight.fun)
Upvotes: 0
Reputation: 44309
The operation will work as intended if you use sapply
instead instead of apply
:
sapply(frame, weight.fun)
# x y
# [1,] 290 10
# [2,] 790 10
# [3,] 410 10
# [4,] 860 10
# [5,] 910 10
# [6,] 50 20
# [7,] 500 20
# [8,] 830 20
# [9,] 510 20
# [10,] 420 20
The reason for this discrepancy is that apply
operates on matrices (or arrays). From ?apply
:
Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix.
Therefore your data frame frame
will be converted to a matrix when using apply
, meaning the data types for all columns will be forced to be the same (strings in your case):
as.matrix(frame)
# x y
# [1,] "29" "1"
# [2,] "79" "1"
# [3,] "41" "1"
# [4,] "86" "1"
# [5,] "91" "1"
# [6,] " 5" "2"
# [7,] "50" "2"
# [8,] "83" "2"
# [9,] "51" "2"
# [10,] "42" "2"
This explains the unexpected behavior with apply
-- weight.fun
is getting passed character vectors.
Meanwhile, sapply
operates over lists, which is just what you want because data frames are lists. Using sapply
, the type of each column is preserved from the data frame, so weight.fun
is first called with an integer vector and then it is called with a factor.
Upvotes: 4