Richard Packer
Richard Packer

Reputation: 21

Create a column that is a list of named columns efficently R

I need to efficently create a new column that is a list of named column inputs. I do not know the names of the columns to input them directly so I need to be able to specify them by using a vector. I have an existing method that uses rowwise and c_across but it is very slow when used in large data.frame.

df <- data.frame(id=c(23,24,25,26), col_1=c(45,56,7,NA), col_2=c(56,23,222,56), col_3=c(89,NA,NA,NA))

col_names_vector <- colnames(df)[-1]

list_col_df <- df %>%
  rowwise() %>% 
  mutate(list_col=list(c_across(all_of(col_names_vector))))

Once I have the data in a list I need to be able to extract either the first, last, minimum or maximum none NA value and then find it's posistion in the list. I use that index with another vector. I have tried using nest() but as that produces a data frame I cannot perform the operations I require on a data frame column.

Any thoughts on ways of improving this code?

Upvotes: 0

Views: 33

Answers (1)

akrun
akrun

Reputation: 887851

In base R, can use asplit with MARGIN = 1 for rowwise split. It returns a list of named vectors

df$list_col <- asplit(df[col_names_vector], 1)

Or in tidyverse, can also use transpose

library(dplyr)
library(purrr)
df %>% 
  mutate(list_col = transpose(across(all_of(col_names_vector))))
  id col_1 col_2 col_3   list_col
1 23    45    56    89 45, 56, 89
2 24    56    23    NA 56, 23, NA
3 25     7   222    NA 7, 222, NA
4 26    NA    56    NA NA, 56, NA

Or may use pmap

df %>%
   mutate(list_col = pmap(across(all_of(col_names_vector)), c))
  id col_1 col_2 col_3   list_col
1 23    45    56    89 45, 56, 89
2 24    56    23    NA 56, 23, NA
3 25     7   222    NA 7, 222, NA
4 26    NA    56    NA NA, 56, NA

Upvotes: 1

Related Questions