greengrass62
greengrass62

Reputation: 986

Mutate values in one column based on another column

I have a dataframe with many columns of numerical data, paired with corresponding quality columns. In the below example, dat is a data frame with numerical columns v1 and v3 that are paired with the quality control columns v1_a and v3_a. v2 does not have a quality control column.

I want to set all the values in v1 and v3 to NA_real_ if the corresponding values in v1_a and v3_a are either "A" or "D". I am looking for an approach within dplyr and using %>% that can be extended to any number of paired numeric and quality control columns.

set.seed(10)
dat   <- tibble(v1 = runif(10)
                , v1_a = c("A", "A", NA, "B", NA, NA, NA, "D", NA, "A" )
                , v2 = runif(10)
                , v3 = runif(10)
                , v3_a = c(NA, "A", "D", "B", NA, "A", NA, "A", NA, "A" ))

dat
       v1 v1_a      v2    v3 v3_a 
 1 0.536  A     0.275  0.354 NA   
 2 0.0931 A     0.229  0.936 A    
 3 0.170  NA    0.0144 0.246 D    
 4 0.900  B     0.729  0.473 B    
 5 0.423  NA    0.250  0.192 NA   
 6 0.748  NA    0.161  0.583 A    
 7 0.823  NA    0.0170 0.459 NA   
 8 0.955  D     0.486  0.467 A    
 9 0.685  NA    0.103  0.400 NA   
10 0.501  A     0.802  0.505 A     

For the above example data, and assuming I want to screen for "A" and "D" (and not "B" or NA), the result would be:

       v1 v1_a      v2     v3 v3_a 
 1 NA     A     0.275   0.354 NA   
 2 NA     A     0.229  NA     A    
 3  0.170 NA    0.0144 NA     D    
 4  0.900 B     0.729   0.473 B    
 5  0.423 NA    0.250   0.192 NA   
 6  0.748 NA    0.161  NA     A    
 7  0.823 NA    0.0170  0.459 NA   
 8 NA     D     0.486  NA     A    
 9  0.685 NA    0.103   0.400 NA   
10 NA     A     0.802  NA     A 

To create the above I used the case_when helper function for each column of interest as shown below

rmkQC <- c("A","D")  # vector of values to screen
    
dat %>%
  mutate(v1 = case_when(v1_a %in% rmkQC ~ NA_real_
                        , TRUE ~ v1)
         , v3 = case_when(v3_a %in% rmkQC ~ NA_real_
                        , TRUE ~ v3))

What I need to do is generalize this so that it can be applied to a data frame with an unknow number of paired columns. Not sure that it matters, but I can identify the paired columns with the below bits of code.

colQC <- names(dat)[grep("_a" , names(dat))]
colV  <- sub("_a$","",colQC)

Upvotes: 3

Views: 7966

Answers (2)

AnilGoyal
AnilGoyal

Reputation: 26218

Though my answer is nearly same as dear Akrun's, still to show you that it an be done in many ways

dat %>%
  mutate(across(c(v1, v3), ~ replace(., get(paste0(cur_column(), '_a')) %in% c('A', 'D'), NA)))

# A tibble: 10 x 5
        v1 v1_a      v2     v3 v3_a 
     <dbl> <chr>  <dbl>  <dbl> <chr>
 1 NA      A     0.652   0.865 NA   
 2 NA      A     0.568  NA     A    
 3  0.427  NA    0.114  NA     D    
 4  0.693  B     0.596   0.356 B    
 5  0.0851 NA    0.358   0.406 NA   
 6  0.225  NA    0.429  NA     A    
 7  0.275  NA    0.0519  0.838 NA   
 8 NA      D     0.264  NA     A    
 9  0.616  NA    0.399   0.771 NA   
10 NA      A     0.836  NA     A 

Upvotes: 2

akrun
akrun

Reputation: 886928

Here is one way to do this in tidyverse

  1. Loop across the columns of interest ('v1', 'v3')
  2. Get the column name (cur_column()), paste (str_c) the suffix part ('_a'), get the column value
  3. Create the logical vector with %in% within case_when and replace that are TRUE from the logical to NA
library(dplyr)
dat1 <- dat %>%
      mutate(across(all_of(colV), ~ case_when(get(str_c(cur_column(), 
             '_a')) %in% rmkQC ~ NA_real_, TRUE ~ .)))

-output

dat1
# A tibble: 10 x 5
        v1 v1_a      v2     v3 v3_a 
     <dbl> <chr>  <dbl>  <dbl> <chr>
 1 NA      A     0.652   0.865 <NA> 
 2 NA      A     0.568  NA     A    
 3  0.427  <NA>  0.114  NA     D    
 4  0.693  B     0.596   0.356 B    
 5  0.0851 <NA>  0.358   0.406 <NA> 
 6  0.225  <NA>  0.429  NA     A    
 7  0.275  <NA>  0.0519  0.838 <NA> 
 8 NA      D     0.264  NA     A    
 9  0.616  <NA>  0.399   0.771 <NA> 
10 NA      A     0.836  NA     A   

In base R we can also do

dat[colV] <-  dat[colV] * NA^`dim<-`(as.matrix(dat[colQC]) %in% 
       rmkQC, dim(dat[colQC]))

Upvotes: 4

Related Questions