Reputation: 21
I need to efficently create a new column that is a list of named column inputs. I do not know the names of the columns to input them directly so I need to be able to specify them by using a vector. I have an existing method that uses rowwise and c_across but it is very slow when used in large data.frame.
df <- data.frame(id=c(23,24,25,26), col_1=c(45,56,7,NA), col_2=c(56,23,222,56), col_3=c(89,NA,NA,NA))
col_names_vector <- colnames(df)[-1]
list_col_df <- df %>%
rowwise() %>%
mutate(list_col=list(c_across(all_of(col_names_vector))))
Once I have the data in a list I need to be able to extract either the first, last, minimum or maximum none NA value and then find it's posistion in the list. I use that index with another vector. I have tried using nest() but as that produces a data frame I cannot perform the operations I require on a data frame column.
Any thoughts on ways of improving this code?
Upvotes: 0
Views: 33
Reputation: 887851
In base R
, can use asplit
with MARGIN = 1
for rowwise split. It returns a list
of named vector
s
df$list_col <- asplit(df[col_names_vector], 1)
Or in tidyverse
, can also use transpose
library(dplyr)
library(purrr)
df %>%
mutate(list_col = transpose(across(all_of(col_names_vector))))
id col_1 col_2 col_3 list_col
1 23 45 56 89 45, 56, 89
2 24 56 23 NA 56, 23, NA
3 25 7 222 NA 7, 222, NA
4 26 NA 56 NA NA, 56, NA
Or may use pmap
df %>%
mutate(list_col = pmap(across(all_of(col_names_vector)), c))
id col_1 col_2 col_3 list_col
1 23 45 56 89 45, 56, 89
2 24 56 23 NA 56, 23, NA
3 25 7 222 NA 7, 222, NA
4 26 NA 56 NA NA, 56, NA
Upvotes: 1