crytpodoc
crytpodoc

Reputation: 51

Looping over multiple columns to generate a new variable based on a condition

I am trying to generate a new column (variable) based on the value inside multiple columns. I have over 60 columns in the dataset and I wanted to subset the columns that I want to loop through.

The column variables I am using in my condition at all characters, and when a certain pattern is matched, to return a value of 1 in the new variable.

I am using when because I need to run multiple conditions on each column to return a value.

CODE:

df read.csv("sample.csv")

*#Generate new variable name*
df$new_var <- 0

*#For loop through columns 16 to 45* 
for (i in colnames(df[16:45])) {
   df <- df %>%
     mutate(new_var= 
         case_when(
           grepl("I8501", df[[i]]) ~ 1
           ))
}

This does not work as when I table the results, I only get 1 value matched.

My other attempt was using:

for (i in colnames(df[16:45])) {
   df <- df %>%
    mutate(new_var= 
            case_when(
               df[[i]] == "I8501" ~ 1
            ))
} 

Any other possible ways to run through multiple columns with multiple conditions and change the value of the variable accordingly? to be achieved using R ?

Upvotes: 1

Views: 1214

Answers (2)

Vinay
Vinay

Reputation: 253

Kindly check if this works for your file.

Sample df:

df <- data.frame(C1=c('A','B','C','D'),C2=c(1,7,3,4),C3=c(5,6,7,8))

> df
  C1 C2 C3
1  A  1  5
2  B  7  6
3  C  3  7
4  D  4  8



library(dplyr)
df %>% 
 rowwise() %>%
 mutate(new_var = as.numeric(any(str_detect(c_across(2:last_col()), "7")))) # change the 2:last_col() to select your column range ex: 2:5

Output for finding "7" in any of the columns:

  C1       C2    C3 new_var
  <chr> <dbl> <dbl>   <dbl>
1 A         1     5       0
2 B         7     6       1
3 C         3     7       1
4 D         4     8       0

Upvotes: 0

Thomas Rosa
Thomas Rosa

Reputation: 742

If I'm understanding what you want, I think you just need to specify another case in your case_when() for keeping the existing values when things don't match "I8501". This is how I would do that:

df$new_var <- 0
for (index in (16:45)) {
  df <- df %>%
    mutate(
      new_var = case_when(
        grepl("I8501", df[[index]]) ~ 1,
        TRUE ~ df$new_var
      )
    )
}

I think a better way to do this though would be to use the ever useful apply():

has_match = apply(df[, 16:45], 1, function(x) sum(grepl("I8501", x)) > 0)
df$new_var = ifelse(has_match, 1, 0)

Upvotes: 1

Related Questions