stapperen
stapperen

Reputation: 137

Return column names based on condition

I've a dataset with 18 columns from which I need to return the column names with the highest value(s) for each observation, simple example below. I came across this answer, and it almost does what I need, but in some cases I need to combine the names (like abin maxcolbelow). How should I do this?

Any suggestions would be greatly appreciated! If it's possible it would be easier for me to understand a tidyverse based solution as I'm more familiar with that than base.

Edit: I forgot to mention that some of the columns in my data have NAs.

library(dplyr, warn.conflicts = FALSE)

#turn this
Df <- tibble(a = 4:2, b = 4:6, c = 3:5)

#into this
Df <- tibble(a = 4:2, b = 4:6, c = 3:5, maxol = c("ab", "b", "b"))

Created on 2018-10-30 by the reprex package (v0.2.1)

Upvotes: 2

Views: 3253

Answers (2)

mandmeier
mandmeier

Reputation: 409

Here's a solution I found that loops through column names in case you find it hard to wrap your head around spread/gather (pivot_wider/longer)

out_df <- Df %>%
  # calculate rowwise maximum
  rowwise() %>%
  mutate(rowmax = max(across())) %>%
  # create empty maxcol column
  mutate(maxcol = "")
  
# loop through column names
for (colname in colnames(Df)) {
  out_df <- out_df %>%
    # if the value at the specified column name is the maximum, paste it to the maxcol
    mutate(maxcol = ifelse(.data[[colname]] == rowmax, paste0(maxcol, colname), maxcol))
}

# remove rowmax column if no longer needed
out_df <- out_df %>%
  select(-rowmax)

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388807

Continuing from the answer in the linked post, we can do

Df$maxcol <- apply(Df, 1, function(x) paste0(names(Df)[x == max(x)], collapse = ""))

Df

#      a     b     c maxcol
#  <int> <int> <int> <chr> 
#1     4     4     3  ab    
#2     3     5     4  b     
#3     2     6     5  b 

For every row, we check which position has max values and paste the names at that position together.


If you prefer the tidyverse approach

library(tidyverse)
Df %>%
  mutate(row = row_number()) %>%
  gather(values, key, -row) %>%
  group_by(row) %>%
  mutate(maxcol = paste0(values[key == max(key)], collapse = "")) %>%
  spread(values, key) %>%
  ungroup() %>%
  select(-row)

# maxcol     a     b     c
#  <chr>  <int> <int> <int>
#1 ab         4     4     3
#2 b          3     5     4
#3 b          2     6     5

We first convert dataframe from wide to long using gather, then group_by each row we paste column names for max key and then spread the long dataframe to wide again.

Upvotes: 5

Related Questions