clarke19
clarke19

Reputation: 25

Combining ifelse statements and for loops R

I have data on whether someone has been accepted to a range of schools (1 to 35) and how they have preferences these school. My dataset will have many more students and schools eventually but this is a mock-up dataset to do some trail work with. The data looks like this, but will have many more schools also:

> head(Schools)
     ID              S1_AR S1_Rank              S2_AR S2_Rank
1 ID001 Provisional Accept       1 Provisional Accept       2
2 ID002 Provisional Accept       1     No Application      NA
3 ID003 Provisional Reject       1 Provisional Accept       2
4 ID004 Provisional Reject       2 Provisional Accept       1
5 ID005 Provisional Accept       3 Provisional Accept       1

Each person cannot attend more than one school, so if they are provisionally accepted at more than one school their acceptance is dis-considered for the school they have ranked lowest in their preferences. This means that their place can then become available for another student to take.

I've written code which identifies students that have been accepted to schools 1 and 2, compares their ranking, and updates their acceptance status, as below. However, this is for only one pair of schools and I will need to do it for each pair of 35 different schools. Is there a way of using the for loop to automate this process for the different pairs of schools?

Also, I know my code is fairly clunky at the moment but this is the only way I've got it to work at the moment. Would using nested ifelse() help make this code more manageable within a for loop?

Schools$Match <-ifelse(Schools$S1_AR == "Provisional Accept" & Schools$S2_AR == "Provisional Accept", 1, 0)
Schools$Preference<- ifelse(Schools$S1_Rank<Schools$S2_Rank, 1,2)
Schools$S2_AR[Schools$Match == 1 & Schools$Preference == 1]<- "Accepted Elsewhere"
Schools$S1_AR[Schools$Match == 1 & Schools$Preference == 2]<- "Accepted Elsewhere"


head(Schools)

     ID              S1_AR S1_Rank              S2_AR S2_Rank Match Preference
1 ID001 Provisional Accept       1 Accepted Elsewhere       2     1          1
2 ID002 Provisional Accept       1     No Application      NA     0         NA
3 ID003 Provisional Reject       1 Provisional Accept       2     0          1
4 ID004 Provisional Reject       2 Provisional Accept       1     0          2
5 ID005 Accepted Elsewhere       3 Provisional Accept       1     1          2

Upvotes: 0

Views: 122

Answers (1)

Harry Smith
Harry Smith

Reputation: 278

Here is a real hacky function, but it should work for as many schools as you want as long as the data set looks like the one you provided. Basically it does the following:

  1. Makes a temp data set and adds a variable that checks to see if a student has "accepted" at more than one school meaning a decision will need to be made based on the rank of choice (per your description if I understand that right).

  2. Next, it generates 2 vectors of indices. These indices will be used to extract the column name or school that is either chosen due to default (no choice) or chosen due to rank.

  3. Generates a vector of school names based on logic checks. If a choice needs to be made the school names with the lowest rank will be added to the vector at the position of the student, and if no decision needs to be made it will add the school that the student accepted. I did this incase there is ever a situation in which a student accepted a school that wasn't there first choice (see row 6 in example data).

As @slava-kohut, points out, there is probably a more elegant way of doing this all in tidy, and it would be worth looking into if you can.

Here is the function. I hope this helps:

## data set
df <- data.frame("ID" = 1:6,
             "S1_AR" = c("PA", "PA", "PR", "PR", "PA", "PA"),
             "S1_Rank" = c(1, 1, 1, 2, 3, 2),
             "S2_AR" = c("PA", "NA", "PA", "PA", "PA", "PR"),
             "S2_Rank" = c(2, NA, 2, 1, 1, 1))
my_fun <- function(data){
  ## generate tmp data and ready data for downstream classification
  tmp <- data %>%
    mutate(ck_1 = rowSums(. == "PA"), 
           ck_2 = ifelse(ck_1 > 1, TRUE, FALSE),
           ck_2 = ifelse(is.na(ck_2), FALSE, ck_2))
  ## generate rank table
  tmp.rank <- tmp %>%
    dplyr::select(contains("_Rank"))
  ## generate choice table
  tmp.ar <- tmp %>%
    dplyr::select(contains("_AR"))
  ## generate index of highest ranked school choice
  index_choose <- apply(tmp[, which(colnames(tmp) %in% colnames(tmp.rank))], 1, 
function(x){
    which.min(x)
  })
  ## generate index for school accepted when no choice needs to be made
  index_nochoose <- sapply(apply(tmp[, which(colnames(tmp) %in% 
  colnames(tmp.ar))], 1, function(x){
    which(x == "PA")
  }), function(a) a[[1]])
  ## Generate decision vector
  decision <- c()
  for(i in 1:nrow(df)){
    ifelse(tmp$ck_2[i] == TRUE, decision[i] <- colnames(tmp.rank) 
    [index_choose[i]],
           decision[i] <- colnames(tmp.ar)[index_nochoose[i]]) 
  }
  decision <- sapply(strsplit(decision, split = "_", fixed = TRUE),  
  function(x) 
    x[1])
  ## add vector and output result
  tmp$decision <- decision
  return(tmp)
}

my_fun(df)


  ID S1_AR S1_Rank S2_AR S2_Rank ck_1  ck_2 decision
1  1    PA       1    PA       2    2  TRUE       S1
2  2    PA       1    NA      NA   NA FALSE       S1
3  3    PR       1    PA       2    1 FALSE       S2
4  4    PR       2    PA       1    1 FALSE       S2
5  5    PA       3    PA       1    2  TRUE       S2
6  6    PA       2    PR       1    1 FALSE       S1

Upvotes: 0

Related Questions