Reputation: 3
I have a following problem where I'm extracting a vector which only contains NA
s and I want to sum this. But instead of 0
the system returns an error. This is because of passing a parameter variable into the function.
Consider this excerpt of code:
ConsData
is a data.frame with 5 columns and multiple rows.
Assume we have variables/columns A B C D E;
Column D is just NA
WorkSum <- function(var) {
Sumer <- (sum(ConsData[VARIABLE], na.rm = TRUE))
}
WorkSum(D)
The following Error is produced:
Error in FUN(X[[i]], ...) : only defined on a data frame with all numeric variables
However, if I don't parametrise and rewrite this line as follow, it all works.
Sumer <- (sum(ConsData$D, na.rm = TRUE))
Upvotes: 0
Views: 120
Reputation: 3
Thank you for the responses and your assistance. From this I gather that because the vector is full on "NA", read.csv picks this vector up as a logical instead of a numeric, due to the automatic detection. The solution i went with now is to specify colClasses on the data which then forced the vector to be numeric and everything worked.
Thanks again
Andrzej
Upvotes: 0
Reputation: 2864
Let's reproduce your scenario:
ConsData <- data.frame(
A = c(1, 2, NA),
D = replicate(3, NA)
)
If you want to keep the same function, you need to modify it as @markus already pointed out:
# making var and VARIBALE consistent and providing a return value
WorkSum <- function(var) {
sum(ConsData[var], na.rm = TRUE)
}
In this context, it is necessary to use the column name in a string:
WorkSum("A") # working fine
WorkSum("D") # producing the error mentioned in question
The actual question is why are the commands
sum(ConsData['A'], na.rm = TRUE)
sum(ConsData$D, na.rm = TRUE)
working fine but not the following
sum(ConsData['D'], na.rm = TRUE)
You can take a look at their structure to have a better idea:
str(ConsData['A']) # NA is in a variable of numeric type here
# 'data.frame': 3 obs. of 1 variable:
# $ A: num 1 2 NA
str(ConsData$D) # plain vector
# logi [1:3] NA NA NA
str(ConsData['D']) # NAs are in a variable of logical type
# 'data.frame': 3 obs. of 1 variable:
# $ D: logi NA NA NA
The function sum
with na.rm = T
behaves the way you expect when a vector or a single-numeric-column data frame is passed in. However, it gives this error when single-logical-column data frame is passed in. We can conclude that the function checks for the type when the parameter is data frame and only accepts numeric variables as stated in the error message. You simply need to adjust your code keeping this behavior in mind as it makes sense.
Upvotes: 1