Tim
Tim

Reputation: 367

Ignore NA's in sapply function

I am using R and have searched around for an answer but while I have seen similar questions, it has not worked for my specific problem.

In my data set I am trying to use the NA's as placeholders because I am going to return to them once I get part of my analysis done so therefore, I would like to be able to do all my calculations as if the NA's weren't really there.

Here's my issue with an example data table

ROCA = c(1,3,6,2,1,NA,2,NA,1,NA,4,NA)
ROCA <- data.frame (ROCA=ROCA)       # converting it just because that is the format of my original data

#Now my function
exceedes <- function (L=NULL, R=NULL, na.rm = T)
 {
    if (is.null(L) | is.null(R)) {
        print ("mycols: invalid L,R.")
        return (NULL)               
    }
    test <-(mean(L, na.rm=TRUE)-R*sd(L,na.rm=TRUE))
  test1 <- sapply(L,function(x) if((x)> test){1} else {0})
  return (test1)
}
L=ROCA[,1]
R=.5
ROCA$newcolumn <- exceedes(L,R)
names(ROCA)[names(ROCA)=="newcolumn"]="Exceedes1"

I am getting the error:

Error in if ((x) > test) { : missing value where TRUE/FALSE needed 

As you guys know, it is something wrong with the sapply function. Any ideas on how to ignore those NA's? I would try na.omit if I could get it to insert all the NA's right where they were before, but I am not sure how to do that.

Upvotes: 5

Views: 7812

Answers (3)

Joshua Ulrich
Joshua Ulrich

Reputation: 176648

There's no need for sapply and your anonymous function because > is already vectorized.

It also seems really odd to specify default argument values that are invalid. My guess is that you're using that as a kludge instead of using the missing function. It's also good practice to throw an error rather than return NULL because you would still have to try to catch when the function returns NULL.

exceedes <- function (L, R, na.rm=TRUE)
{
  if(missing(L) || missing(R)) {
    stop("L and R must be provided")
  }
  test <- mean(L,na.rm=TRUE)-R*sd(L,na.rm=TRUE)
  as.numeric(L > test)
}

ROCA <- data.frame(ROCA=c(1,3,6,2,1,NA,2,NA,1,NA,4,NA))
ROCA$Exceeds1 <- exceedes(ROCA[,1],0.5)

Upvotes: 5

Tommy
Tommy

Reputation: 40813

Do you want NA:s in the result? That is, do you want the rows to line up?

seems like just returning L > test would work then. And adding the column can be simplified too (I suspect "Exeedes1" is in a variable somewhere).

exceedes <- function (L=NULL, R=NULL, na.rm = T)
 {
    if (is.null(L) | is.null(R)) {
        print ("mycols: invalid L,R.")
        return (NULL)               
    }
    test <-(mean(L, na.rm=TRUE)-R*sd(L,na.rm=TRUE))

    L > test
}
L=ROCA[,1]
R=.5
ROCA[["Exceedes1"]] <- exceedes(L,R)

Upvotes: 2

jimmyb
jimmyb

Reputation: 4387

This statement is strange:

test1 <- sapply(L,function(x) if((x)> test){1} else {0})

Try:

test1 <- ifelse(is.na(L), NA, ifelse(L > test, 1, 0))

Upvotes: 4

Related Questions