Reputation: 802
Let's consider markers
with their coefficient of variation (cv
) and three reference cv (rcv
):
Initial data:
marker cv rcv1 rcv2 rcv3
<chr> <dbl> <dbl> <dbl> <dbl>
1 AAA 7 10 8 5
2 BBB 4 5 3 1
3 CCC 11 20 15 12
4 DDD 8 7 5 2
I would like to mutate three new variables:
rcv_value
: the closest rcv
value greater than the cv
rcv_name
: the column name of that rcv_value
cv_conclusion
:
cv
is lower than one or the other of the rcvs
cv
is higher than the highest rcv
Desired output:
marker cv rcv1 rcv2 rcv3 rcv_value rcv_name cv_conclusion
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 AAA 7 10 8 5 8 rcv2 ok
2 BBB 4 5 3 1 5 rcv1 ok
3 CCC 11 20 15 12 12 rcv3 ok
4 DDD 8 7 5 2 7 rcv1 ko
NB: my real data has more than 100 markers
and about 10 different rcv
.
Where I fail is getting the rcv_name
from the corresponding rcv_value
(using mutate
and case_when
).
Thanks for help.
Data:
dat0 <-
structure(list(marker = c("AAA", "BBB", "CCC", "DDD"), cv = c(7,
4, 11, 8), rcv1 = c(10, 5, 20, 7), rcv2 = c(8, 3, 15, 5), rcv3 = c(5,
1, 12, 2)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L))
Upvotes: 8
Views: 297
Reputation: 35604
You can use this tidyverse
manner that combines dplyr::mutate
and purrr::pmap
:
library(dplyr)
dat0 %>%
mutate(
purrr::pmap_dfr(pick(cv, rcv1:rcv3), ~ {
x <- c(...)[-1]
tibble::enframe(x[order(x < ..1, abs(x - ..1), -x)][1],
name = "rcv_name", value = "rcv_value")
}), cv_conclusion = ifelse(rcv_value >= cv, "ok", "ko")
)
or the following dplyr
-only alternative:
dat0 %>%
rowwise() %>%
mutate(rcv_value = {
x <- c_across(rcv1:rcv3)
x[order(x < cv, abs(x - cv), -x)][1]
}) %>%
ungroup() %>%
mutate(
rcv_name = do.call(coalesce, across(rcv1:rcv3, ~ ifelse(.x == rcv_value, cur_column(), NA))),
cv_conclusion = ifelse(rcv_value >= cv, "ok", "ko")
)
# # A tibble: 4 × 8
# marker cv rcv1 rcv2 rcv3 rcv_value rcv_name cv_conclusion
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
# 1 AAA 7 10 8 5 8 rcv2 ok
# 2 BBB 4 5 3 1 5 rcv1 ok
# 3 CCC 11 20 15 12 12 rcv3 ok
# 4 DDD 8 7 5 2 7 rcv1 ko
Upvotes: 7
Reputation: 20494
This is a case where reshaping to long makes it much more straightforward. First use tidyr::pivot_longer()
, then dplyr::mutate(.by = marker)
to perform these operations by group, and then tidyr::pivot_wider()
to return the data to the original shape.
library(dplyr)
library(tidyr)
dat0 |>
pivot_longer(-c(marker, cv)) |>
mutate(
rcv_value = min(value[value > cv]),
# rcv_value will be Inf if no values > cv
cv_conclusion = if_else(is.infinite(rcv_value), "ko", "ok"),
rcv_value = if_else(is.infinite(rcv_value), max(value), rcv_value),
rcv_name = name[rcv_value == value],
.by = marker
) |>
# reshape back to wide
pivot_wider(id_cols = c(marker, cv, rcv_value, rcv_name, cv_conclusion)) |>
# reorder columns as desired
relocate(marker, cv, rcv1:rcv3, rcv_value:cv_conclusion)
# # A tibble: 4 × 8
# marker cv rcv1 rcv2 rcv3 rcv_value rcv_name cv_conclusion
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
# 1 AAA 7 10 8 5 8 rcv2 ok
# 2 BBB 4 5 3 1 5 rcv1 ok
# 3 CCC 11 20 15 12 12 rcv3 ok
# 4 DDD 8 7 5 2 7 rcv1 ko
If you can have reference cv ties (i.e. repeated values in the same row in rcv1:rcv3
columns) you'll have to specify which you want rcv_name
to include. If you just want the first value you could change that line to rcv_name = name[rcv_value == value][1]
. Alternatively, if you are doing further processing you could create a list column to store all the values, e.g. list(name[rcv_value == value])
(though I might prefer to just keep the table in long form). Alternatively, if this table is an output, you might want to paste them together e.g. rcv_name = paste(name[rcv_value == value], collapse = ",")
to create results in the column such as "rcv2,rcv3"
.
Upvotes: 6
Reputation: 102529
You can try the following base R option, using max.col
and colMeans
rcvcols <- startsWith(names(dat0), "rcv")
u <- abs((d <- dat0[rcvcols] - dat0$cv) / (d > 0))
idx <- max.col(-u, "first")
dat0$rcv_value <- as.matrix(dat0[rcvcols])[cbind(1:nrow(u), idx)]
dat0$rcv_name <- names(u)[idx]
dat0$cv_conclusion <- c("ko", "ok")[1 + (colMeans(is.infinite(t(u))) < 1)]
which gives
> dat0
# A tibble: 4 × 8
marker cv rcv1 rcv2 rcv3 rcv_value rcv_name cv_conclusion
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 AAA 7 10 8 5 8 rcv2 ok
2 BBB 4 5 3 1 5 rcv1 ok
3 CCC 11 20 15 12 12 rcv3 ok
4 DDD 8 7 5 2 7 rcv1 ko
Upvotes: 6
Reputation: 19339
With a user-defined function:
RCV <- function(dat) {
library(Rfast)
dat <- data.frame(dat)
rcv <- grep('rcv', names(dat))
M <- as.matrix(dat[,rcv] - dat$cv)
M[M<0] <- NA
rmins <- rowMins(M)
dat$rcv_value <- dat[,rcv][cbind(1:nrow(dat), rmins)]
dat$rcv_name <- names(dat[,rcv])[rmins]
dat$cv_conclusion <- ifelse(dat0$cv < dat$rcv_value, "ok", "ko")
dat
}
RCV(dat0)
___
marker cv rcv1 rcv2 rcv3 rcv_value rcv_name cv_conclusion
1 AAA 7 10 8 5 8 rcv2 ok
2 BBB 4 5 3 1 5 rcv1 ok
3 CCC 11 20 15 12 12 rcv3 ok
4 DDD 8 7 5 2 7 rcv1 ko
Upvotes: 3