user95146
user95146

Reputation: 322

if/then replace values looping over rows conditional on column value(s) in R

I am trying to do an if/then replacement (recoding) of values row by row (looping over rows), based on one the values of or more columns in those rows. I've looked at a lot of prior examples here and elsewhere (R help) but haven't been able to get very far.

Here is an example data set:

> set.seed(1234)
> let<-c("AB","AA","BB")
> df <- data.frame(rbind(x1=c(12,"DF1",sample(let,6,TRUE)),x2=c(12,"HA.1",sample(let,6,TRUE)),x3=c(21,"DF1",sample(let,6,TRUE)),x4=c(12,"AS.2",sample(let,6,TRUE))
+ ))
> df
   X1   X2 X3 X4 X5 X6 X7 X8
x1 12  DF1 AB AA AA AA BB AA
x2 12 HA.1 AB AB AA AA BB AA
x3 21  DF1 AB BB AB BB AB AB
x4 12 AS.2 AB AB AB AB AB AB

I would like to conditionally change the coding (replace) values in columns 3:8 (from X3 through X8) based on values in X1 and X2 using if/then. 'AB' becomes 1 if X1=12 AND X2=DF1, 'AA' becomes 2 if X1=12 AND X2=DF1, 'BB' becomes 3 if X1=12 and X2=DF1 etc. There will be many other (nested?) if statements to add to complete this specific case, but I am not sure how to approach even the most basic aspect of this script: how to condition the replacement of values in columns 3:8 based on the column 1 value (and also column 2 or more columns) at a given row.

So, looping over each row, I would test if the value in X2 = DF1 and X1=12 (for example), and if so in both cases, change values of AB to 1, AA to 2, and BB to 3...

for(i in 1:nrow(df)){
      if((df$X2[i]=="DF1") & (df$X1[i]=12)) {   
          ifelse(df[i,3:8] == "AB", 1, ifelse(df[i,3:8]=="AA", 2,ifelse(df[i,3:8]=="BB",3,"NA")))}
             else{} 
      }

Now...this appears to do nothing - no changes to dfand no warnings. But the ifelse statements work when I specify the row (4):

> ifelse(df[4,3:8] == "AB", 1, ifelse(df[4,3:8]=="AA", 2,ifelse(df[4,3:8]=="BB",3,"NA")))
   X3  X4  X5  X6  X7  X8 
x4 "1" "3" "1" "1" "1" "2"
> df[4,3:8]
   X3 X4 X5 X6 X7 X8
x4 AB BB AB AB AB AA

So it must be something in the initial if & ? Do I need to have something in my else clause?

And of course, my 'real' world use case is more complicated as each different value in X1 or X2 will require different if/then statements to recode the values in colum 3:8

Anyway - am I even approaching this correctly? Would a look up table work better? I would be setting up additional nested if/& statements for each combination of values for X1 and X2. It will be ugly, but if I can get the nested if statements to work, then at least I can get there.

Thanks for any suggestions!

Upvotes: 1

Views: 857

Answers (2)

shs
shs

Reputation: 3899

if((df$X2[i]=="DF1") & (df$X1[i]=12))

In the second comparison you are using = instead of ==. Also, you are not assigning anything within your loop, that is why nothing is happening.

But iterating over rows of a data frame in R is generally not a good idea as it is not very performant. Furthermore, ifelse() already provides a vectorized solution. But unfortunately, you are not using it right, as you need to apply it to each of the columns you are working on.

I think the best solution for what you are trying to do would be the following:

df <- structure(list(X1 = c(12, 12, 21, 12), 
                     X2 = c("DF1", "HA.1", "DF1", "AS.2"), 
                     X3 = c("AB", "AB", "AB", "AB"), 
                     X4 = c("AA", "AB", "BB", "AB"), 
                     X5 = c("AA", "AA", "AB", "AB"), 
                     X6 = c("AA", "AA", "BB", "AB"), 
                     X7 = c("BB", "BB", "AB", "AB"), 
                     X8 = c("AA", "AA", "AB", "AB")), 
                class = "data.frame", row.names = c(NA, -4L)
                )
df
#>   X1   X2 X3 X4 X5 X6 X7 X8
#> 1 12  DF1 AB AA AA AA BB AA
#> 2 12 HA.1 AB AB AA AA BB AA
#> 3 21  DF1 AB BB AB BB AB AB
#> 4 12 AS.2 AB AB AB AB AB AB

rows <- df$X2 == "DF1" & df$X1 == 12
df[rows, 3:8] <- lapply(df[rows, 3:8], function(x) {as.integer(factor(x, c("AB", "AA", "BB")))})
df
#>   X1   X2 X3 X4 X5 X6 X7 X8
#> 1 12  DF1  1  2  2  2  3  2
#> 2 12 HA.1 AB AB AA AA BB AA
#> 3 21  DF1 AB BB AB BB AB AB
#> 4 12 AS.2 AB AB AB AB AB AB

Created on 2020-02-19 by the reprex package (v0.3.0)

You need to make sure here, that your variables are characters, not factors or it won't work.

You could also do it with nested ifelse() calls, as you did before, but it is already pretty verbose for just three.


Edit in response to comment:

# Unite the group vars in one vector
group <- paste(df$X1, df$X2, sep = "-")

# In this list you can say what case should use what recoding
lst <- list("12-DF1" = c("AB" = 1, "AA" = 2, "BB" = 3), 
            "12-HA.1" = c("AB" = 5, "AA" = 3, "BB" = 4), 
            "21-DF1" = c("AB" = 8, "AA" = 22, "BB" = 11), 
            "12-AS.2" = c("AB" = 9, "AA" = 7, "BB" = 6))

# Function to recode single column
rcde_cols <- function(column) mapply(function(col, g) {unlist(lst[[g]][col])}, 
                                     col = as.list(column), 
                                     g = as.list(group))

# Apply to all
df[3:8] <- lapply(df[3:8], rcde_cols)

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389325

We can get the data in long format and then use separate case_when with conditions to recode values and finally get the data back in original format.

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(cols = -c(X1, X2)) %>%
  mutate(value = case_when(X1 == 12 & X2 == 'DF1' & value == 'AB' ~ 1,
                           X1 == 12 & X2 == 'DF1' & value == 'AA' ~ 2, 
                           X1 == 12 & X2 == 'DF1' & value == 'BB' ~ 3, 
                           #Add more conditions as per requirements
                           #....
                           #If none of the above condition satisfy 
                           #return a default value
                           TRUE ~ 0)) %>%
  pivot_wider()

case_when is an alternative to nested ifelse statements since it makes to easy to write for various conditions in simple steps.

Upvotes: 2

Related Questions