Anonymous coward
Anonymous coward

Reputation: 2091

How to make a For loop that keeps the original row value

I am trying to run multiple conditional statements in a loop. My first conditional is an if, else if with 3 conditions (4 technically if nothing matches). My second really only needs one condition, and I want to keep the original row value if it doesn't meet that condition. The problem is my output doesn't match the row numbers, and I'm not sure how to output only to a specific row in a loop.

I want to loop over each column, and within each column I use sapply to check each value for falling outside of a range1 (gets marked with 4), inside of range1 (gets marked with 1), is.na (gets marked with 9), otherwise is marked -999. A narrower range would then be used, if each value in a column falls inside of range2, mark with a 3, otherwise don't update.

My partially working code, and a reproducible example is below. My input and first loop is:

df <- structure(list(A = c(-2, 3, 5, 10, NA), A.c = c(NA, NA, NA, NA, NA), B = c(2.2, -55, 3, NA, 99), B.c = c(NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, -5L))

> df
   A A.c     B B.c
1 -2  NA   2.2  NA
2  3  NA -55.0  NA
3  5  NA   3.0  NA
4 10  NA    NA  NA
5 NA  NA  99.0  NA

min1 <- 0
max1 <- 8

test1.func <- function(x) {
  val <- if (!is.na(x) & is.numeric(x) & (x < min1 | x > max1){
    num = 4
  } else if (!is.na(x) & is.numeric(x) & x >= min1 & x <= max1){
    num = 1
  } else if (is.na(x)){# TODO it would be better to make this just what is already present in the row
  } else {
    num = -999
  }
  val
}

Test1 <- function(x) {
  i <- NA
  for(i in seq(from = 1, to = ncol(x), by = 2)){
    x[, i + 1] <- sapply(x[[i]], test1.func)
  }
  x
}

df_result <- Test1(df)

> df_result
   A A.c     B B.c
1 -2   4   2.2   1
2  3   1 -55.0   4
3  5   1   3.0   1
4 10   4    NA   9
5 NA   9  99.0   4

The next loop and conditional (any existing values of 4 or 9 would remain):

min2 <- 3
max2 <- 5

test2.func <- function(x) {
  val <- if (!is.na(x) & is.numeric(x) & (x < min2 | x > max2){
    num = 3
  }
  val
}

Test2 <- function(x) {
  i <- NA
  for(i in seq(from = 1, to = ncol(x), by = 2)){
    x[, i + 1] <- sapply(x[[i]], test2.func)
  }
  x
}

df_result2 <- Test2(df_result)
# Only 2.2 matches, if working correctly would output
> df_result2
   A A.c     B B.c
1 -2   4   2.2   3
2  3   1 -55.0   4
3  5   1   3.0   1
4 10   4    NA   9
5 NA   9  99.0   4

Current code errors, since there is only one match:

Warning messages:
1: In `[<-.data.frame`(`*tmp*`, , i + 1, value = list(3, NULL, NULL,  :
  provided 5 variables to replace 1 variables

Upvotes: 0

Views: 346

Answers (1)

r2evans
r2evans

Reputation: 160397

Some thoughts.

  1. for loops are not necessary, it is better to capitalize on R's vectorized operations;
  2. it appears that your values of 4 and 3 are really something like "outside band 1" and "outside band 2", in which case this can be resolved in one function.
  3. Testing for == "NA" is a bit off ... if one of the values in a column is a string "NA" (and not R's NA value), then all values in that column are strings and you have other problems. Because of this, I don't explicitly check for is.numeric, though it is not hard to work back in.

Try this:

func <- function(x, range1, range2) {
  ifelse(is.na(x), 9L,
         ifelse(x < range1[1] | x > range1[2], 4L,
                ifelse(x < range2[1] | x > range2[2], 3L,
                       1L)))
}

df[,c("A.c", "B.c")] <- lapply(df[,c("A", "B")], func, c(0, 8), c(3, 5))
df
#    A A.c     B B.c
# 1 -2   4   2.2   3
# 2  3   1 -55.0   4
# 3  5   1   3.0   1
# 4 10   4    NA   9
# 5 NA   9  99.0   4

One problem I have with this is that it uses a 3-nested ifelse loop. While this works fine, it can be difficult to trace and troubleshoot (and ifelse has problems of its own). If you have other conditions to incorporate, it might be nice to use dplyr::case_when.

func2 <- function(x, range1, range2) {
  dplyr::case_when(
    is.na(x)                      ~ 9L,
    x < range1[1] | x > range1[2] ~ 4L,
    x < range2[1] | x > range2[2] ~ 3L,
    TRUE                          ~ 1L
  )
}

I find this second method much easier to read, though it does have the added dependency of dplyr (which, while it definitely has advantages and strengths, includes an army of other dependencies). If you are already using any of the tidyverse packages in your workflow, though, this is likely the better solution.

Upvotes: 2

Related Questions