Reputation: 51
suppose I have the following data:
data <- data.frame(Opt_1 = c("AA", "GG", "CC", "GG"),
Opt_2 = c("AT", "GC", "CG", "CG"),
Opt_3 = c("TA", "CG", "GC", "GC"),
Opt_4 = c("TT", "CC", "GG", "CC"),
prob_1 = c(0.025, 0.025, 0.925, 0.025),
prob_2 = c(0.025, 0.925, 0.025, 0.025),
prob_3 = c(0.925, 0.025, 0.025, 0.025),
prob_4 = c(0.025, 0.025, 0.025, 0.925))
Opt_1 Opt_2 Opt_3 Opt_4 prob_1 prob_2 prob_3 prob_4
AA AT TA TT 0.025 0.025 0.925 0.025
GG GC CG CC 0.025 0.925 0.025 0.025
CC CG GC GG 0.025 0.025 0.025 0.925
GG CG GC CC 0.925 0.025 0.025 0.025
I would like to define a new column that captures, for each row what is the Opt with maximum prob. Giving an output like the following:
Opt_1 Opt_2 Opt_3 Opt_4 prob_1 prob_2 prob_3 prob_4 out
AA AT TA TT 0.025 0.025 0.925 0.025 TA
GG GC CG CC 0.025 0.925 0.025 0.025 GC
CC CG GC GG 0.025 0.025 0.025 0.925 GG
GG CG GC CC 0.925 0.025 0.025 0.025 GG
I wrote a for()
loop but it is taking forever as the data base is pretty large.
Upvotes: 1
Views: 58
Reputation: 96
Oneliner in base R:
data$max <- apply(data, 1, function(x) { x[which(x == max(x[5:8]))[1] -4] })
gives (on your data):
> data
Opt_1 Opt_2 Opt_3 Opt_4 prob_1 prob_2 prob_3 prob_4 max
1 AA AT TA TT 0.025 0.025 0.925 0.025 TA
2 GG GC CG CC 0.025 0.925 0.025 0.925 GC
3 CC CG GC GG 0.025 0.025 0.025 0.025 CC
4 GG CG GC CC 0.925 0.025 0.025 0.025 GG
Upvotes: 2
Reputation: 101064
Here is a base R solution
data$out <- mapply(function(i,j) data[grep("Opt_",names(data))][i,j],
seq(nrow(data)),
max.col(data[grep("prob_",names(data))]))
such that
> data
Opt_1 Opt_2 Opt_3 Opt_4 prob_1 prob_2 prob_3 prob_4 out
1 AA AT TA TT 0.025 0.025 0.925 0.025 TA
2 GG GC CG CC 0.025 0.925 0.025 0.025 GC
3 CC CG GC GG 0.025 0.025 0.025 0.925 GG
4 GG CG GC CC 0.925 0.025 0.025 0.025 GG
Upvotes: 1
Reputation: 32548
nm = paste0("Opt_", max.col(data[paste0("prob_", 1:4)], ties.method = "first"))
as.character(sapply(seq_along(nm), function(i) data[i, nm[i]]))
#[1] "TA" "GC" "CC" "GG"
Upvotes: 3
Reputation: 160407
Here's a tidyverse solution:
library(dplyr)
library(tidyr)
# uniquely identify each row, for joining back in later
dat$rn <- seq_len(nrow(dat))
as_tibble(dat) %>%
pivot_longer(
cols = -rn,
names_to = c(".value", "set"),
names_pattern = "(.+)_(.)"
) %>%
group_by(rn) %>%
slice(which.max(prob)) %>%
select(rn, out = Opt) %>%
left_join(dat, ., by = "rn")
# Opt_1 Opt_2 Opt_3 Opt_4 prob_1 prob_2 prob_3 prob_4 rn out
# 1 AA AT TA TT 0.025 0.025 0.925 0.025 1 TA
# 2 GG GC CG CC 0.025 0.925 0.025 0.925 2 GC
# 3 CC CG GC GG 0.025 0.025 0.025 0.025 3 CC
# 4 GG CG GC CC 0.925 0.025 0.025 0.025 4 GG
Upvotes: 1