Sum function in R bombs out when passing a parametrised variable of NA's

I have a following problem where I'm extracting a vector which only contains NAs and I want to sum this. But instead of 0 the system returns an error. This is because of passing a parameter variable into the function.

Consider this excerpt of code:

ConsData is a data.frame with 5 columns and multiple rows. Assume we have variables/columns A B C D E; Column D is just NA

WorkSum <- function(var) {
  Sumer <- (sum(ConsData[VARIABLE], na.rm = TRUE))
}
WorkSum(D)

The following Error is produced:

Error in FUN(X[[i]], ...) : only defined on a data frame with all numeric variables

However, if I don't parametrise and rewrite this line as follow, it all works.

 Sumer <- (sum(ConsData$D, na.rm = TRUE))

Upvotes: 0

Views: 120

Answers (2)

Thank you for the responses and your assistance. From this I gather that because the vector is full on "NA", read.csv picks this vector up as a logical instead of a numeric, due to the automatic detection. The solution i went with now is to specify colClasses on the data which then forced the vector to be numeric and everything worked.

Thanks again

Andrzej

Upvotes: 0

ozanstats
ozanstats

Reputation: 2864

Let's reproduce your scenario:

ConsData <- data.frame(
  A = c(1, 2, NA),
  D = replicate(3, NA)
)

If you want to keep the same function, you need to modify it as @markus already pointed out:

# making var and VARIBALE consistent and providing a return value
WorkSum <- function(var) {
  sum(ConsData[var], na.rm = TRUE)
}

In this context, it is necessary to use the column name in a string:

WorkSum("A") # working fine
WorkSum("D") # producing the error mentioned in question

The actual question is why are the commands

sum(ConsData['A'], na.rm = TRUE)
sum(ConsData$D, na.rm = TRUE)

working fine but not the following

sum(ConsData['D'], na.rm = TRUE)

You can take a look at their structure to have a better idea:

str(ConsData['A']) # NA is in a variable of numeric type here
# 'data.frame': 3 obs. of  1 variable:
#  $ A: num  1 2 NA

str(ConsData$D) # plain vector
# logi [1:3] NA NA NA

str(ConsData['D']) # NAs are in a variable of logical type
# 'data.frame': 3 obs. of  1 variable:
#  $ D: logi  NA NA NA

The function sum with na.rm = T behaves the way you expect when a vector or a single-numeric-column data frame is passed in. However, it gives this error when single-logical-column data frame is passed in. We can conclude that the function checks for the type when the parameter is data frame and only accepts numeric variables as stated in the error message. You simply need to adjust your code keeping this behavior in mind as it makes sense.

Upvotes: 1

Related Questions