Reputation: 23
I made a for loop with many if else statements for my dataset and 2 empty vectors. However, I am getting a warning message saying:
In closenessSupport[i] <- rowMeans(seniorEdPlans[c("closenessFriends", ... : number of items to replace is not a multiple of replacement length.
I just wondering on how to fix this vector length problem because I think it is messing with my intention to find the mean of 2 columns.. Any help appreciated.
Upvotes: 1
Views: 174
Reputation: 3923
Please accept one of the right answers
You had asked yesterday what was wrong with your loop. I looked today. The issue was running the rowwise
inside the loop. It's already based on rows so running it inside a for loop that iterates through your rows is bound to cause problems.
I also made an example data set with representative values for your data. May not matter for your current data but a for loop will be much slower. In the 20,000 rows case a for loop took 1.4 seconds. The dplyr
solution 11 milliseconds.
# build a reproducible dataset assume valid scores 1 - 8
# we'll make 9's equal to NA
set.seed(2020)
a <- sample(1:9, 20000, replace = TRUE)
a[a == 9] <- NA
set.seed(2021)
b <- sample(1:9, 20000, replace = TRUE)
b[b == 9] <- NA
seniorEdPlans2 <- data.frame(closenessFriends = a,
closenessMentors = b)
# use apply to calculate numSupportSources
seniorEdPlans2$numSupportSources <- apply(seniorEdPlans2,
1,
function(x) sum(!is.na(x))
)
# head(seniorEdPlans2, 50) # close enough
# this was the source of your error message it's already
# row based so can't put it in a for loop
seniorEdPlans2$closenessSupport <- rowMeans(seniorEdPlans2[c('closenessFriends', 'closenessMentors')],
na.rm = TRUE)
# your for loop
for (i in 1:nrow(seniorEdPlans2)) {
if (seniorEdPlans2$numSupportSources[i] == 2) {
seniorEdPlans2$supportType[i] <- "Both"
} else if (seniorEdPlans2$numSupportSources[i] == 0) {
seniorEdPlans2$supportType[i] <- "Neither"
} else if (!is.na(seniorEdPlans2$closenessFriends[i])) {
seniorEdPlans2$supportType[i] <- "FriendOnly"
} else {
seniorEdPlans2$supportType[i] <- "MentorOnly"
}
}
# head(seniorEdPlans2, 50)
Created on 2020-05-05 by the reprex package (v0.3.0)
Upvotes: 1
Reputation: 3923
Wow, way too many ith's for me. But a few nudges towards an answer. You definitely don't want a for loop down all the rows of your dataframe in this case. r
is optimized to work on columns. I'm not totally sure I understand all your conditionals, but most likely dplyr::case_when
will serve you well.
I grabbed your data and dput
ted just the first 20 rows. Then I wrote a mutate
and case_when
that produces a start towards closenessSupport
. Is this sort of what you're out to do?
Revised after your additional input just the columns of interest
# https://stackoverflow.com/questions/61582653
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
seniored <- structure(list(id = 1:20,
age = c(17L, 16L, 17L, 16L, 17L, 18L,
17L, 17L, 18L, 16L, 17L, 17L, 17L, 17L, 17L, 17L, 16L, 17L, 16L,
18L),
higherEd = structure(c(1L, 5L, 1L, 1L, 3L, 1L, 2L, 2L,
5L, 5L, 3L, 4L, 3L, 2L, 5L, 3L, 4L, 5L, 1L, 1L), .Label = c("2-year",
"4-year", "None", "Other", "Vocational"), class = "factor"),
riskGroup = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 3L,
1L, 3L, 3L, 2L, 1L, 3L, 2L, 2L, 3L, 1L, 3L), .Label = c("High",
"Low", "Medium"), class = "factor"),
GPA = c(3.169, 2.703,
3.225, 2.488, 2.618, 2.928, 3.176, 3.256, 3.48, 3.244, 3.265,
3.4, 3.109, 3.513, 3.102, 2.656, 2.853, 3.046, 2.304, 3.473
),
closenessFriends = c(7L, 7L, 7L, 8L, NA, NA, NA, 6L, 7L,
NA, 5L, 6L, 3L, 1L, 1L, NA, 8L, 2L, NA, 8L),
closenessMentors = c(6L,
NA, 5L, NA, 5L, 4L, 8L, 6L, 4L, 5L, 4L, 4L, 4L, 5L, 5L, 5L,
3L, 4L, NA, 5L),
numSupportSources = c(2L, 1L, 2L, 1L, 1L,
1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 0L, 2L
)), row.names = c(NA, 20L), class = "data.frame")
seniored %>%
mutate(
closenessSupport = case_when(
numSupportSources == 1 & !is.na(closenessFriends) ~ as.numeric(closenessFriends),
numSupportSources == 1 & !is.na(closenessMentors) ~ as.numeric(closenessMentors),
numSupportSources == 2 ~ (closenessFriends + closenessMentors)/2,
numSupportSources == 0 ~ NA_real_),
supportType = case_when(
numSupportSources == 1 & !is.na(closenessFriends) ~ "FriendOnly",
numSupportSources == 1 & !is.na(closenessMentors) ~ "MentorOnly",
numSupportSources == 2 ~ "Both",
numSupportSources == 0 ~ "Neither"
)
) %>%
select(numSupportSources, closenessFriends, closenessMentors, closenessSupport, supportType)
#> numSupportSources closenessFriends closenessMentors closenessSupport
#> 1 2 7 6 6.5
#> 2 1 7 NA 7.0
#> 3 2 7 5 6.0
#> 4 1 8 NA 8.0
#> 5 1 NA 5 5.0
#> 6 1 NA 4 4.0
#> 7 1 NA 8 8.0
#> 8 2 6 6 6.0
#> 9 2 7 4 5.5
#> 10 1 NA 5 5.0
#> 11 2 5 4 4.5
#> 12 2 6 4 5.0
#> 13 2 3 4 3.5
#> 14 2 1 5 3.0
#> 15 2 1 5 3.0
#> 16 1 NA 5 5.0
#> 17 2 8 3 5.5
#> 18 2 2 4 3.0
#> 19 0 NA NA NA
#> 20 2 8 5 6.5
#> supportType
#> 1 Both
#> 2 FriendOnly
#> 3 Both
#> 4 FriendOnly
#> 5 MentorOnly
#> 6 MentorOnly
#> 7 MentorOnly
#> 8 Both
#> 9 Both
#> 10 MentorOnly
#> 11 Both
#> 12 Both
#> 13 Both
#> 14 Both
#> 15 Both
#> 16 MentorOnly
#> 17 Both
#> 18 Both
#> 19 Neither
#> 20 Both
Created on 2020-05-04 by the reprex package (v0.3.0)
Upvotes: 1