user13119375
user13119375

Reputation: 23

Complex If Else Statement in For Loop in R Warning Message

I made a for loop with many if else statements for my dataset and 2 empty vectors. However, I am getting a warning message saying:

In closenessSupport[i] <- rowMeans(seniorEdPlans[c("closenessFriends", ... : number of items to replace is not a multiple of replacement length.

I just wondering on how to fix this vector length problem because I think it is messing with my intention to find the mean of 2 columns.. Any help appreciated.

Upvotes: 1

Views: 174

Answers (2)

Chuck P
Chuck P

Reputation: 3923

Please accept one of the right answers

You had asked yesterday what was wrong with your loop. I looked today. The issue was running the rowwise inside the loop. It's already based on rows so running it inside a for loop that iterates through your rows is bound to cause problems.

I also made an example data set with representative values for your data. May not matter for your current data but a for loop will be much slower. In the 20,000 rows case a for loop took 1.4 seconds. The dplyr solution 11 milliseconds.

# build a reproducible dataset assume valid scores 1 - 8
# we'll make 9's equal to NA

set.seed(2020)
a <- sample(1:9, 20000, replace = TRUE)
a[a == 9] <- NA
set.seed(2021)
b <- sample(1:9, 20000, replace = TRUE)
b[b == 9] <- NA

seniorEdPlans2 <- data.frame(closenessFriends = a,
                              closenessMentors = b)

# use apply to calculate numSupportSources
seniorEdPlans2$numSupportSources <- apply(seniorEdPlans2, 
                                          1, 
                                          function(x) sum(!is.na(x))
                                          )

# head(seniorEdPlans2, 50) # close enough

# this was the source of your error message it's already
# row based so can't put it in a for loop
seniorEdPlans2$closenessSupport <- rowMeans(seniorEdPlans2[c('closenessFriends', 'closenessMentors')], 
                                           na.rm = TRUE)

# your for loop
for (i in 1:nrow(seniorEdPlans2)) {
  if (seniorEdPlans2$numSupportSources[i] == 2) {
    seniorEdPlans2$supportType[i] <- "Both"
  } else if (seniorEdPlans2$numSupportSources[i] == 0) {
    seniorEdPlans2$supportType[i] <- "Neither"
  } else if (!is.na(seniorEdPlans2$closenessFriends[i])) {
    seniorEdPlans2$supportType[i] <- "FriendOnly"
  } else {
    seniorEdPlans2$supportType[i] <- "MentorOnly"
  }
}

# head(seniorEdPlans2, 50)

Created on 2020-05-05 by the reprex package (v0.3.0)

Upvotes: 1

Chuck P
Chuck P

Reputation: 3923

Wow, way too many ith's for me. But a few nudges towards an answer. You definitely don't want a for loop down all the rows of your dataframe in this case. r is optimized to work on columns. I'm not totally sure I understand all your conditionals, but most likely dplyr::case_when will serve you well.

I grabbed your data and dputted just the first 20 rows. Then I wrote a mutate and case_when that produces a start towards closenessSupport. Is this sort of what you're out to do?

Revised after your additional input just the columns of interest

# https://stackoverflow.com/questions/61582653
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
seniored <- structure(list(id = 1:20,
                           age = c(17L, 16L, 17L, 16L, 17L, 18L,
                                    17L, 17L, 18L, 16L, 17L, 17L, 17L, 17L, 17L, 17L, 16L, 17L, 16L,
                                    18L),
                           higherEd = structure(c(1L, 5L, 1L, 1L, 3L, 1L, 2L, 2L,
                                                  5L, 5L, 3L, 4L, 3L, 2L, 5L, 3L, 4L, 5L, 1L, 1L), .Label = c("2-year",
                                                                                                                                       "4-year", "None", "Other", "Vocational"), class = "factor"),
                           riskGroup = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 3L,
                                                   1L, 3L, 3L, 2L, 1L, 3L, 2L, 2L, 3L, 1L, 3L), .Label = c("High",
                                                                                                           "Low", "Medium"), class = "factor"),
                           GPA = c(3.169, 2.703,
                                                                                                                                                        3.225, 2.488, 2.618, 2.928, 3.176, 3.256, 3.48, 3.244, 3.265,
                                                                                                                                                        3.4, 3.109, 3.513, 3.102, 2.656, 2.853, 3.046, 2.304, 3.473
                                                                                                           ),
                           closenessFriends = c(7L, 7L, 7L, 8L, NA, NA, NA, 6L, 7L,
                                                                                                                                   NA, 5L, 6L, 3L, 1L, 1L, NA, 8L, 2L, NA, 8L),
                           closenessMentors = c(6L,
                                                                                                                                                                                                     NA, 5L, NA, 5L, 4L, 8L, 6L, 4L, 5L, 4L, 4L, 4L, 5L, 5L, 5L,
                                                                                                                                                                                                     3L, 4L, NA, 5L),
                           numSupportSources = c(2L, 1L, 2L, 1L, 1L,
                                                                                                                                                                                                                                            1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 0L, 2L
                                                                                                                                                                                                     )), row.names = c(NA, 20L), class = "data.frame")
seniored %>%
  mutate(
    closenessSupport = case_when(
      numSupportSources == 1 & !is.na(closenessFriends) ~ as.numeric(closenessFriends),
      numSupportSources == 1 & !is.na(closenessMentors) ~ as.numeric(closenessMentors),
      numSupportSources == 2 ~ (closenessFriends + closenessMentors)/2,
      numSupportSources == 0 ~ NA_real_),
    supportType = case_when(
      numSupportSources == 1 & !is.na(closenessFriends) ~ "FriendOnly",
      numSupportSources == 1 & !is.na(closenessMentors) ~ "MentorOnly",
      numSupportSources == 2 ~ "Both",
      numSupportSources == 0 ~ "Neither"
    )
  ) %>%
  select(numSupportSources, closenessFriends, closenessMentors, closenessSupport, supportType)
#>    numSupportSources closenessFriends closenessMentors closenessSupport
#> 1                  2                7                6              6.5
#> 2                  1                7               NA              7.0
#> 3                  2                7                5              6.0
#> 4                  1                8               NA              8.0
#> 5                  1               NA                5              5.0
#> 6                  1               NA                4              4.0
#> 7                  1               NA                8              8.0
#> 8                  2                6                6              6.0
#> 9                  2                7                4              5.5
#> 10                 1               NA                5              5.0
#> 11                 2                5                4              4.5
#> 12                 2                6                4              5.0
#> 13                 2                3                4              3.5
#> 14                 2                1                5              3.0
#> 15                 2                1                5              3.0
#> 16                 1               NA                5              5.0
#> 17                 2                8                3              5.5
#> 18                 2                2                4              3.0
#> 19                 0               NA               NA               NA
#> 20                 2                8                5              6.5
#>    supportType
#> 1         Both
#> 2   FriendOnly
#> 3         Both
#> 4   FriendOnly
#> 5   MentorOnly
#> 6   MentorOnly
#> 7   MentorOnly
#> 8         Both
#> 9         Both
#> 10  MentorOnly
#> 11        Both
#> 12        Both
#> 13        Both
#> 14        Both
#> 15        Both
#> 16  MentorOnly
#> 17        Both
#> 18        Both
#> 19     Neither
#> 20        Both

Created on 2020-05-04 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions