Reputation: 3
This is a coursework I had for R programming on Coursera. I've completed this course, but I really want to get to the bottom of the errors in my code.
The relevant fields of this data set contains Hospitals, States, and three mortality measures. A function rankall(outcome, num = "best")
needs to be created to return a data frame showing hospital in each state based on the rank passed through num
argument and mortality measure passed through outcome.
When num
was a numeric value, my script worked correctly. However, when num
was either "best" or "worst", incorrect hospitals were returned.
When calling tail(rankall("pneumonia", "worst"), 3)
, the expected value returned should be the following:
hospital state
WI MAYO CLINIC HEALTH SYSTEM - NORTHLAND, INC WI
WV PLATEAU MEDICAL CENTER WV
WY NORTH BIG HORN HOSPITAL DISTRICT WY
The returned value I got is below:
state
WI "HOLY FAMILY MEMORIAL INC" "WI"
WV "THOMAS MEMORIAL HOSPITAL" "WV"
WY "SHERIDAN MEMORIAL HOSPITAL" "WY"
Thanks in advance to anyone who is willing to spare time reading my post.
Here is my script:
rankall <- function (outcome, num = "best") {
dat <- read.csv("outcome-of-care-measures.csv", colClass = "character", na.strings = "Not Available")
#verify outcome
if (!outcome %in% c("heart attack", "heart failure", "pneumonia")){
stop("invalid outcome")
}
#verify num
if (!is.numeric(num)) {
if (is.character(num)) {
if (!num %in% c("best", "worst")){
stop("invalid num")
}
}
}
col = c("heart attack" = 11, "heart failure" = 17, "pneumonia" = 23)
state = unique(dat$State)
state = state[order(state)]
subdat <- dat[,c(2,7,col)]
names(subdat) <- c("hospital", "state", "death")
subdat$death <- as.numeric(subdat$death)
subdat$hospital <- as.character(subdat$hospital)
subdat <- subdat[complete.cases(subdat$death),]
s = split(subdat, subdat$state)
ls_hospital <- lapply(s, function(x, num) {
x <- x[order(x$death, x$hospital, na.last = NA),]
if (num == "best") {return(x$hospital[1])}
else if (num == "worst") {return(x$hospital[nrow(x)])}
else {return(x$hospital[num])}
}, num)
ans <- cbind(unlist(ls_hospital), state)
return(ans)
}
Upvotes: 0
Views: 140
Reputation: 1751
There is only one error. When you are subsetting the original dataframe your code is:
subdat <- dat[,c(2,7,col)]
This is taking State, Hospital Name and all the fields specified in col. Basically no mortality measure selection takes place. The correct code is:
subdat <- dat[,c(2,7,col[outcome])]
In this way the correct mortality measure is selected and the rest of the code works as expected.
Upvotes: 3