Reputation: 986
I have a dataframe with many columns of numerical data, paired with corresponding quality columns.
In the below example, dat
is a data frame with numerical columns v1
and v3
that are paired with the quality control columns v1_a
and v3_a
. v2
does not have a quality control column.
I want to set all the values in v1
and v3
to NA_real_ if the corresponding values in v1_a
and v3_a
are either "A" or "D". I am looking for an approach within dplyr and using %>% that can be extended to any number of paired numeric and quality control columns.
set.seed(10)
dat <- tibble(v1 = runif(10)
, v1_a = c("A", "A", NA, "B", NA, NA, NA, "D", NA, "A" )
, v2 = runif(10)
, v3 = runif(10)
, v3_a = c(NA, "A", "D", "B", NA, "A", NA, "A", NA, "A" ))
dat
v1 v1_a v2 v3 v3_a
1 0.536 A 0.275 0.354 NA
2 0.0931 A 0.229 0.936 A
3 0.170 NA 0.0144 0.246 D
4 0.900 B 0.729 0.473 B
5 0.423 NA 0.250 0.192 NA
6 0.748 NA 0.161 0.583 A
7 0.823 NA 0.0170 0.459 NA
8 0.955 D 0.486 0.467 A
9 0.685 NA 0.103 0.400 NA
10 0.501 A 0.802 0.505 A
For the above example data, and assuming I want to screen for "A" and "D" (and not "B" or NA), the result would be:
v1 v1_a v2 v3 v3_a
1 NA A 0.275 0.354 NA
2 NA A 0.229 NA A
3 0.170 NA 0.0144 NA D
4 0.900 B 0.729 0.473 B
5 0.423 NA 0.250 0.192 NA
6 0.748 NA 0.161 NA A
7 0.823 NA 0.0170 0.459 NA
8 NA D 0.486 NA A
9 0.685 NA 0.103 0.400 NA
10 NA A 0.802 NA A
To create the above I used the case_when
helper function for each column of interest as shown below
rmkQC <- c("A","D") # vector of values to screen
dat %>%
mutate(v1 = case_when(v1_a %in% rmkQC ~ NA_real_
, TRUE ~ v1)
, v3 = case_when(v3_a %in% rmkQC ~ NA_real_
, TRUE ~ v3))
What I need to do is generalize this so that it can be applied to a data frame with an unknow number of paired columns. Not sure that it matters, but I can identify the paired columns with the below bits of code.
colQC <- names(dat)[grep("_a" , names(dat))]
colV <- sub("_a$","",colQC)
Upvotes: 3
Views: 7966
Reputation: 26218
Though my answer is nearly same as dear Akrun's, still to show you that it an be done in many ways
dat %>%
mutate(across(c(v1, v3), ~ replace(., get(paste0(cur_column(), '_a')) %in% c('A', 'D'), NA)))
# A tibble: 10 x 5
v1 v1_a v2 v3 v3_a
<dbl> <chr> <dbl> <dbl> <chr>
1 NA A 0.652 0.865 NA
2 NA A 0.568 NA A
3 0.427 NA 0.114 NA D
4 0.693 B 0.596 0.356 B
5 0.0851 NA 0.358 0.406 NA
6 0.225 NA 0.429 NA A
7 0.275 NA 0.0519 0.838 NA
8 NA D 0.264 NA A
9 0.616 NA 0.399 0.771 NA
10 NA A 0.836 NA A
Upvotes: 2
Reputation: 886928
Here is one way to do this in tidyverse
across
the columns of interest ('v1', 'v3')cur_column()
), paste
(str_c
) the suffix part ('_a'), get
the column value%in%
within case_when
and replace that are TRUE from the logical to NA
library(dplyr)
dat1 <- dat %>%
mutate(across(all_of(colV), ~ case_when(get(str_c(cur_column(),
'_a')) %in% rmkQC ~ NA_real_, TRUE ~ .)))
-output
dat1
# A tibble: 10 x 5
v1 v1_a v2 v3 v3_a
<dbl> <chr> <dbl> <dbl> <chr>
1 NA A 0.652 0.865 <NA>
2 NA A 0.568 NA A
3 0.427 <NA> 0.114 NA D
4 0.693 B 0.596 0.356 B
5 0.0851 <NA> 0.358 0.406 <NA>
6 0.225 <NA> 0.429 NA A
7 0.275 <NA> 0.0519 0.838 <NA>
8 NA D 0.264 NA A
9 0.616 <NA> 0.399 0.771 <NA>
10 NA A 0.836 NA A
In base R
we can also do
dat[colV] <- dat[colV] * NA^`dim<-`(as.matrix(dat[colQC]) %in%
rmkQC, dim(dat[colQC]))
Upvotes: 4