user2300940
user2300940

Reputation: 2385

Pick row with highest value and return rowname

For each column I would like to select the rowname that has the highes value across all rows. Please see my input and output example.

> df.t[,1:5][1:5,]
                MMRF_1021 MMRF_1024 MMRF_1029 MMRF_1030 MMRF_1031
ENSG00000004468  8.195680  7.500753  7.912472  8.886945  6.780892
ENSG00000081237  4.401101  6.135663  6.525512  4.496787  7.927844
ENSG00000139193  6.124573  6.585169  5.547023  6.254043  2.764494
ENSG00000156738 -1.491527  2.237000  7.192401  8.032151 -4.253239
ENSG00000174059 -5.663732 -4.477220 -5.663732 -4.237282 -4.792564

out

MMRF_1021 ENSG00000004468
MMRF_1024 ENSG00000004468
MMRF_1029 ENSG00000004468 
MMRF_1030 ENSG00000004468
MMRF_1031 ENSG00000081237 

Upvotes: 2

Views: 501

Answers (2)

akrun
akrun

Reputation: 886948

In this case, get the column wise index of max value with sapply, convert the named vector to a two column data.frame (stack) and transform the index to the row names by using that index as numeric index

transform(stack(sapply(df.t, which.max))[2:1], values = row.names(df.t)[values])
#   ind          values
#1 MMRF_1021 ENSG00000004468
#2 MMRF_1024 ENSG00000004468
#3 MMRF_1029 ENSG00000004468
#4 MMRF_1030 ENSG00000004468
#5 MMRF_1031 ENSG00000081237

Or make it a bit more compact

stack(lapply(df.t, function(x) row.names(df.t)[which.max(x)]))

Or using max.col with stack

stack(setNames(row.names(df.t)[max.col(t(df.t))],  names(df.t)))[2:1]
#      ind          values
#1 MMRF_1021 ENSG00000004468
#2 MMRF_1024 ENSG00000004468
#3 MMRF_1029 ENSG00000004468
#4 MMRF_1030 ENSG00000004468
#5 MMRF_1031 ENSG00000081237

The similar option in tidyverse would be to use summarise_all to loop over all columns get the index with which.max, convert to a two column data.frame and change the index

library(dplyr)
library(tidyr)
df.t %>% 
   summarise_all(which.max) %>%
   pivot_longer(everything()) %>%
   mutate(value = row.names(df.t)[value])

Or we can avoid the last step with

df.t %>%
   summarise_all(~ row.names(df.t)[which.max(.)]) %>%
   pivot_longer(everything())

Or another option is to do the pivot_longer first, then do a group by operation

df.t %>% 
   pivot_longer(everything()) %>% 
   group_by(name) %>% 
   summarise(value = row.names(df.t)[which.max(value)])

Or using map with enframe

library(purrr)
library(tibble)
map(df.t, ~ row.names(df.t)[which.max(.x)]) %>%
     enframe %>%
     unnest(c(value))

data

df.t <- structure(list(MMRF_1021 = c(8.19568, 4.401101, 6.124573, -1.491527, 
-5.663732), MMRF_1024 = c(7.500753, 6.135663, 6.585169, 2.237, 
-4.47722), MMRF_1029 = c(7.912472, 6.525512, 5.547023, 7.192401, 
-5.663732), MMRF_1030 = c(8.886945, 4.496787, 6.254043, 8.032151, 
-4.237282), MMRF_1031 = c(6.780892, 7.927844, 2.764494, -4.253239, 
-4.792564)), class = "data.frame", row.names = c("ENSG00000004468", 
"ENSG00000081237", "ENSG00000139193", "ENSG00000156738", "ENSG00000174059"
))

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388817

We can transform the dataframe and use max.col

data.frame(col = names(df), value  = rownames(df)[max.col(t(df))])

#        col           value
#1 MMRF_1021 ENSG00000004468
#2 MMRF_1024 ENSG00000004468
#3 MMRF_1029 ENSG00000004468
#4 MMRF_1030 ENSG00000004468
#5 MMRF_1031 ENSG00000081237

data

df <- structure(list(MMRF_1021 = c(8.19568, 4.401101, 6.124573, -1.491527, 
-5.663732), MMRF_1024 = c(7.500753, 6.135663, 6.585169, 2.237, 
-4.47722), MMRF_1029 = c(7.912472, 6.525512, 5.547023, 7.192401, 
-5.663732), MMRF_1030 = c(8.886945, 4.496787, 6.254043, 8.032151, 
-4.237282), MMRF_1031 = c(6.780892, 7.927844, 2.764494, -4.253239, 
 -4.792564)), class = "data.frame", row.names = c("ENSG00000004468", 
"ENSG00000081237", "ENSG00000139193", "ENSG00000156738", "ENSG00000174059"))

Upvotes: 1

Related Questions