Paryl
Paryl

Reputation: 221

R Concatenate column names into new column while sorting by their value

I'm trying to concatenate a string that identifies the order of the columns by their value.

set.seed(100)

df <- tibble(id = 1:5,
             col1 = sample(1:50, 5),
             col2 = sample(1:50, 5),
             col3 = sample(1:50, 5)) %>% 
  mutate_at(vars(-id), ~if_else(. <= 20, NA_integer_, .))

# A tibble: 5 x 4
     id  col1  col2  col3
  <int> <int> <int> <int>
1     1    NA    44    NA
2     2    38    23    34
3     3    48    22    NA
4     4    25    NA    48
5     5    NA    NA    43

res <- df %>% 
  add_column(order = c('col2',
                       'col2_col3_co1',
                       'col2_col1',
                       'col1_col3',
                       'col3'))

# A tibble: 5 x 5
     id  col1  col2  col3 order        
  <int> <int> <int> <int> <chr>        
1     1    NA    44    NA col2         
2     2    38    23    34 col2_col3_co1
3     3    48    22    NA col2_col1    
4     4    25    NA    48 col1_col3    
5     5    NA    NA    43 col3 

My current data is in the form of df while the column I'm trying to add is the order column in res. The ordering of the elements in the string is determined by the value of each column, and it also needs to skip over NAs. I'm trying to identify the sequence that each ID takes to populate a value in each column as the values are time in days. However, not all IDs will have a value in all columns, so there's missing values throughout. I usually work within tidyverse, but any solution or thoughts would be much appreciated.

Upvotes: 1

Views: 666

Answers (1)

akrun
akrun

Reputation: 886938

An easier option is apply, loop over the rows (MARGIN = 1), remove the NA elements, order the rest of the non-NA, use the index to get the column names and paste them together

df$order <- apply(df[-1], 1, function(x) {x1 <- x[!is.na(x)]
           paste(names(x1)[order(x1)], collapse="_")})
df$order
#[1] "col2"           "col2_col3_col1" "col2_col1"      "col1_col3"      "col3" 

Or using tidyverse

library(dplyr)
library(tidyr)
library(stringr)
df %>%
   pivot_longer(cols = -id, values_drop_na = TRUE) %>%
   arrange(id,  value) %>%
   group_by(id) %>%
   summarise(order = str_c(name, collapse="_")) %>% 
   right_join(df) %>%
   select(names(df), order)
# A tibble: 5 x 5
#     id  col1  col2  col3 order         
#  <int> <int> <int> <int> <chr>         
#1     1    NA    44    NA col2          
#2     2    38    23    34 col2_col3_col1
#3     3    48    22    NA col2_col1     
#4     4    25    NA    48 col1_col3     
#5     5    NA    NA    43 col3       

Or using pmap from purrr

library(purrr)
df %>% 
   mutate(order = pmap_chr(select(., starts_with('col')), ~
         {x <- c(...)
         x1 <- x[!is.na(x)]
         str_c(names(x1)[order(x1)], collapse="_")}))

Upvotes: 1

Related Questions