nancy
nancy

Reputation: 9

IF statement causing wrong results in function

I am trying work on a coursera assignment in R. My code works correctly for assignment 1 where I write a function in R to rank Hospitals; in assignment 2 I have to add few IF or IF ELSE IF statements to the function already written.

Function gives a final dataframe.

  1. IF input argument num==best, function returns the first row of final data frame
  2. IF input argument num==worst, function returns the last row of final data frame
  3. If input argument num > max row count of final data frame, function returns NA
  4. If input argument num< max row count, function returns that row from data frame

Now, the if statement are working correctly for only scenario 3 and 4. For 1 and 2 scenarios, it is returning NA- which is the return value of scenario 3.

There is something wrong with the way I am writing IF statements ( can be sequence or the condition or something else) because of which I am getting NA return value for scenario 1,2

Code below, TIA

outcomeDF<-outcome[,c(2,7,n)]
names(outcomeDF)<-c("Hospital","State","Outcomess")
finalDF<-filter(outcomeDF,outcomeDF$State==sta)
     
DFSlist<-arrange(finalDF,finalDF$State,finalDF$Outcomess,finalDF$Hospital)

if (num > nrow(DFSlist)) print ("NA")
  else if (num < nrow(DFSlist)) c<-(DFSlist[num,])
  else if (num =="best")c<-(DFSlist[1,])
  else (num =="worst")c<-(DFSlist[(nrow(DFSlist)),])
return(c)

Upvotes: 0

Views: 252

Answers (2)

Len Greski
Len Greski

Reputation: 10855

The second part of the Johns Hopkins University Coursera R Programming course assignment 3 is a function called rankhospital().

One of the reasons the code in the original post fails is that it makes a direct comparison between num and nrow(DFSlist). When num == "best" or num == "worst", the first condition in the OP returns TRUE, and returns NA instead of the first row in the DFSlist data frame.

The rankhospital() function includes two positional arguments, state, outcome, and one named argument, num.

The data for the function comes from the 2012 outcome of care measurements in the Hospital Compare database provided by the U.S. Department of Health and Human Services.

The stub of the required function looks like this.

rankhospital <- function(state, outcome, num="best") {
   # answer goes here
}

The function needs to do three things, including:

  1. Read the hospital outcomes data
  2. Validate the input arguments (e.g. check for invalid state, etc.)
  3. Process the data and return the n-th ranked hospital in the state specified in the state argument for one of three outcomes (heart attack, heart failure, or pneumonia)

The question in the OP asks about how to use the third argument in the function, num, to return best, worst, or a numeric rank.

Once the data has been subset to the correct state, and sorted per the instructions, one way to process the num argument is as follows.

# sort & subset here
 sortedSubset <- # code goes here, includes hospital, state, other variables
                 # sorted in required order (outcome then hospital name)

 # return hospital name, given num argument 
 if (num == "best") {
      return( sortedSubset[1,1]) 
 } else if (num == "worst") {
      return(sortedSubset[nrow(sortedSubset),1])
 } else if (as.numeric(num) > nrow(sortedSubset)) {
      return("NA")
 } else return (sortedSubset[as.numeric(num),1])

When working correctly, the function produces the following answers to the test cases that are provided with the assignment instructions.

> source("./rprogramming/rankhospital.R")
> rankhospital("TX","heart failure",4)
[1] "DETAR HOSPITAL NAVARRO"
> rankhospital("MD", "heart attack","worst")
[1] "HARFORD MEMORIAL HOSPITAL"
> rankhospital("MN","heart attack",5000)
[1] "NA"

NOTE: Posting complete solutions to programming assignments in the JHU Data Science Specialization is a violation of the Coursera Honor Code. Therefore, I explain where the OP code is broken without posting a complete solution for the rankhospital() function.

Upvotes: 1

Ken Osborne
Ken Osborne

Reputation: 91

Have you tried wrapping the if statements in braces?

Also, c() is the way we make things into vectors, so I'd highly recommend against naming a variable c.

  # DF manipulation
  outcomeDF <- outcome[, c(2, 7, n)]
  names(outcomeDF) <-c("Hospital", "State", "Outcomess")
  finalDF <- filter(outcomeDF, outcomeDF$State==sta)
  DFSlist <- arrange(finalDF, finalDF$State, finalDF$Outcomess, finalDF$Hospital)

  # results logic
  if (num > nrow(DFSlist)) {
     print ("NA")
  } else if (num < nrow(DFSlist)) {
     res <-(DFSlist[num,])
  } else if (num =="best") {
     res <-(DFSlist[1,])
  } else {
     # (num =="worst") <- this doesn't seem to do anything
     res <-(DFSlist[(nrow(DFSlist)),])
    }

  return(res)

Also, by cleaning up your code I found a line of code that was sitting there and probably interfering with your results. That could be the culprit.

Also, num isn't defined for this snippet of code, and you have a hanging } which you probably know about too.

Upvotes: 0

Related Questions