Reputation: 2385
For each column I would like to select the rowname that has the highes value across all rows. Please see my input and output example.
> df.t[,1:5][1:5,]
MMRF_1021 MMRF_1024 MMRF_1029 MMRF_1030 MMRF_1031
ENSG00000004468 8.195680 7.500753 7.912472 8.886945 6.780892
ENSG00000081237 4.401101 6.135663 6.525512 4.496787 7.927844
ENSG00000139193 6.124573 6.585169 5.547023 6.254043 2.764494
ENSG00000156738 -1.491527 2.237000 7.192401 8.032151 -4.253239
ENSG00000174059 -5.663732 -4.477220 -5.663732 -4.237282 -4.792564
out
MMRF_1021 ENSG00000004468
MMRF_1024 ENSG00000004468
MMRF_1029 ENSG00000004468
MMRF_1030 ENSG00000004468
MMRF_1031 ENSG00000081237
Upvotes: 2
Views: 501
Reputation: 886948
In this case, get the column wise index of max value with sapply
, convert the named vector
to a two column data.frame (stack
) and transform
the index to the row names by using that index as numeric index
transform(stack(sapply(df.t, which.max))[2:1], values = row.names(df.t)[values])
# ind values
#1 MMRF_1021 ENSG00000004468
#2 MMRF_1024 ENSG00000004468
#3 MMRF_1029 ENSG00000004468
#4 MMRF_1030 ENSG00000004468
#5 MMRF_1031 ENSG00000081237
Or make it a bit more compact
stack(lapply(df.t, function(x) row.names(df.t)[which.max(x)]))
Or using max.col
with stack
stack(setNames(row.names(df.t)[max.col(t(df.t))], names(df.t)))[2:1]
# ind values
#1 MMRF_1021 ENSG00000004468
#2 MMRF_1024 ENSG00000004468
#3 MMRF_1029 ENSG00000004468
#4 MMRF_1030 ENSG00000004468
#5 MMRF_1031 ENSG00000081237
The similar option in tidyverse
would be to use summarise_all
to loop over all columns get the index with which.max
, convert to a two column data.frame and change the index
library(dplyr)
library(tidyr)
df.t %>%
summarise_all(which.max) %>%
pivot_longer(everything()) %>%
mutate(value = row.names(df.t)[value])
Or we can avoid the last step with
df.t %>%
summarise_all(~ row.names(df.t)[which.max(.)]) %>%
pivot_longer(everything())
Or another option is to do the pivot_longer
first, then do a group by operation
df.t %>%
pivot_longer(everything()) %>%
group_by(name) %>%
summarise(value = row.names(df.t)[which.max(value)])
Or using map
with enframe
library(purrr)
library(tibble)
map(df.t, ~ row.names(df.t)[which.max(.x)]) %>%
enframe %>%
unnest(c(value))
df.t <- structure(list(MMRF_1021 = c(8.19568, 4.401101, 6.124573, -1.491527,
-5.663732), MMRF_1024 = c(7.500753, 6.135663, 6.585169, 2.237,
-4.47722), MMRF_1029 = c(7.912472, 6.525512, 5.547023, 7.192401,
-5.663732), MMRF_1030 = c(8.886945, 4.496787, 6.254043, 8.032151,
-4.237282), MMRF_1031 = c(6.780892, 7.927844, 2.764494, -4.253239,
-4.792564)), class = "data.frame", row.names = c("ENSG00000004468",
"ENSG00000081237", "ENSG00000139193", "ENSG00000156738", "ENSG00000174059"
))
Upvotes: 0
Reputation: 388817
We can transform the dataframe and use max.col
data.frame(col = names(df), value = rownames(df)[max.col(t(df))])
# col value
#1 MMRF_1021 ENSG00000004468
#2 MMRF_1024 ENSG00000004468
#3 MMRF_1029 ENSG00000004468
#4 MMRF_1030 ENSG00000004468
#5 MMRF_1031 ENSG00000081237
data
df <- structure(list(MMRF_1021 = c(8.19568, 4.401101, 6.124573, -1.491527,
-5.663732), MMRF_1024 = c(7.500753, 6.135663, 6.585169, 2.237,
-4.47722), MMRF_1029 = c(7.912472, 6.525512, 5.547023, 7.192401,
-5.663732), MMRF_1030 = c(8.886945, 4.496787, 6.254043, 8.032151,
-4.237282), MMRF_1031 = c(6.780892, 7.927844, 2.764494, -4.253239,
-4.792564)), class = "data.frame", row.names = c("ENSG00000004468",
"ENSG00000081237", "ENSG00000139193", "ENSG00000156738", "ENSG00000174059"))
Upvotes: 1