learneR
learneR

Reputation: 53

r for loop with if else statement and reference to outcome of previous iteration

I am having a dataframe with field x containing both group names (labeled as letters in the example below) and members of the group (listed under the group names, labeled as a number). I want to create a field that shows for each member the name of its group. In the dataframe below the desired output is shown in column "outcome".

df <- data.frame("x"=c("A","1","2","B","C","1","2","C","D","1"),
                 "outcome"=c("A","A","A","B","C","C","C","C","D","D")
) %>%
  mutate(
    Letter = ifelse(grepl("[A-Za-z]", x) == T,"Letter",
                      "No Letter")
  )

My idea is to do this via a forloop. If x is a letter it should return that letter, if not it should return the outcome of the previous loop (which is the previous found letter in x). The forloop below doesn't give the right output:

df$outcome_calc[1] <- "A" 
for (i in 2:10) {  
  df$outcome_calc[i] <- ifelse(df$Letter[i] == "No Letter",df$outcome_calc[i-1],df$x[i])    

}

Any ideas how to get the right output?

Upvotes: 1

Views: 209

Answers (5)

Rui Barradas
Rui Barradas

Reputation: 76402

Here are two tidyverse ways, very similar, using the convenience function zoo::na.locf.

First:

library(tidyverse)

df %>%
  mutate(na = is.na(as.numeric(as.character(x))),
         outcome2 = ifelse(na, as.character(x), NA_character_),
         outcome2 = zoo::na.locf(outcome2)) %>%
  select(-na)

Another one:

df %>%
  mutate(chr = !grepl("[[:digit:]]", x),
         outcome2 = ifelse(chr, as.character(x), NA_character_),
         outcome2 = zoo::na.locf(outcome2)) %>%
  select(-chr)

Upvotes: 2

Uwe
Uwe

Reputation: 42544

dplyr

Here is a stream-lined version of Rui's 2nd approach which doesn't require to create a temporary helper column. It uses stringr::str_detect(), if_else(), and zoo::na.locf().

library(dplyr)
df %>% 
  mutate(outcome2 = if_else(stringr::str_detect(x, "\\D"), x, factor(NA)) %>% zoo::na.locf())
   x outcome    Letter outcome2
1  A       A    Letter        A
2  1       A No Letter        A
3  2       A No Letter        A
4  B       B    Letter        B
5  C       C    Letter        C
6  1       C No Letter        C
7  2       C No Letter        C
8  C       C    Letter        C
9  D       D    Letter        D
10 1       D No Letter        D

data.table

For the sake of completeness, here is also data.table approach which I have used frequently. It uses assignment by reference to update df.

library(data.table)
setDT(df)[x %like% "\\D", outcome2 := x][, outcome2 := zoo::na.locf(outcome2)][]
    x outcome    Letter outcome2
 1: A       A    Letter        A
 2: 1       A No Letter        A
 3: 2       A No Letter        A
 4: B       B    Letter        B
 5: C       C    Letter        C
 6: 1       C No Letter        C
 7: 2       C No Letter        C
 8: C       C    Letter        C
 9: D       D    Letter        D
10: 1       D No Letter        D

Upvotes: 0

jakub
jakub

Reputation: 5104

Using tidyr::fill - requires NAs where your numbers were:

df = data.frame(x = c("A","1","2","B","C","1","2","C","D","1"),
                stringsAsFactors = FALSE)

df$x[grepl("[0-9]+", df$x)] = NA

tidyr::fill(df, x)
   x
1  A
2  A
3  A
4  B
5  C
6  C
7  C
8  C
9  D
10 D

Upvotes: 1

dc37
dc37

Reputation: 16178

Alternatively, you can avoid for loop by using sapply function.

You can define the position of your letters:

pos_letter <- grep("[A-Za-z]", df$x)

Then, use sapply to 1) define for each row, the position of the letter right above and finally replaced each values by the corresponding letter:

df$out <- sapply(1:nrow(df),function(x) max(pos_letter[pos_letter <= x]))
df$out2 <- sapply(df$out, function(x) x = as.character(df[x,"x"]))

   x outcome out out2
1  A       A   1    A
2  1       A   1    A
3  2       A   1    A
4  B       B   4    B
5  C       C   5    C
6  1       C   5    C
7  2       C   5    C
8  C       C   8    C
9  D       D   9    D
10 1       D   9    D

You can combine both sapply function in a single line by writing:

sapply(1:nrow(df), function(n) as.character(df[max(pos_letter[pos_letter <= n]),"x"]))

[1] "A" "A" "A" "B" "C" "C" "C" "C" "D" "D"

Upvotes: 1

YOLO
YOLO

Reputation: 21709

Here's a way to do this using for loop:

# keeps track of previous letter
prev = ''

# output
op = c()

for (i in df$x){

    # check the pattern
    check = grepl(pattern = '[a-zA-Z]', x = i, ignore.case = T)

    if(isTRUE(check)){
        op = c(op, i)
        prev = i
    } else {
        op = c(op, prev)
    }

}

print(op)
[1] "A" "A" "A" "B" "C" "C" "C" "C" "D" "D"

Upvotes: 1

Related Questions