Reputation: 221
I'm trying to concatenate a string that identifies the order of the columns by their value.
set.seed(100)
df <- tibble(id = 1:5,
col1 = sample(1:50, 5),
col2 = sample(1:50, 5),
col3 = sample(1:50, 5)) %>%
mutate_at(vars(-id), ~if_else(. <= 20, NA_integer_, .))
# A tibble: 5 x 4
id col1 col2 col3
<int> <int> <int> <int>
1 1 NA 44 NA
2 2 38 23 34
3 3 48 22 NA
4 4 25 NA 48
5 5 NA NA 43
res <- df %>%
add_column(order = c('col2',
'col2_col3_co1',
'col2_col1',
'col1_col3',
'col3'))
# A tibble: 5 x 5
id col1 col2 col3 order
<int> <int> <int> <int> <chr>
1 1 NA 44 NA col2
2 2 38 23 34 col2_col3_co1
3 3 48 22 NA col2_col1
4 4 25 NA 48 col1_col3
5 5 NA NA 43 col3
My current data is in the form of df while the column I'm trying to add is the order column in res. The ordering of the elements in the string is determined by the value of each column, and it also needs to skip over NAs. I'm trying to identify the sequence that each ID takes to populate a value in each column as the values are time in days. However, not all IDs will have a value in all columns, so there's missing values throughout. I usually work within tidyverse, but any solution or thoughts would be much appreciated.
Upvotes: 1
Views: 666
Reputation: 886938
An easier option is apply
, loop over the rows (MARGIN = 1
), remove the NA
elements, order
the rest of the non-NA, use the index to get the column names and paste
them together
df$order <- apply(df[-1], 1, function(x) {x1 <- x[!is.na(x)]
paste(names(x1)[order(x1)], collapse="_")})
df$order
#[1] "col2" "col2_col3_col1" "col2_col1" "col1_col3" "col3"
Or using tidyverse
library(dplyr)
library(tidyr)
library(stringr)
df %>%
pivot_longer(cols = -id, values_drop_na = TRUE) %>%
arrange(id, value) %>%
group_by(id) %>%
summarise(order = str_c(name, collapse="_")) %>%
right_join(df) %>%
select(names(df), order)
# A tibble: 5 x 5
# id col1 col2 col3 order
# <int> <int> <int> <int> <chr>
#1 1 NA 44 NA col2
#2 2 38 23 34 col2_col3_col1
#3 3 48 22 NA col2_col1
#4 4 25 NA 48 col1_col3
#5 5 NA NA 43 col3
Or using pmap
from purrr
library(purrr)
df %>%
mutate(order = pmap_chr(select(., starts_with('col')), ~
{x <- c(...)
x1 <- x[!is.na(x)]
str_c(names(x1)[order(x1)], collapse="_")}))
Upvotes: 1