Mario Niepel
Mario Niepel

Reputation: 1165

collapsing values across row

I am trying to clean up an irregular dataframe using dplyr functions. My intent here is to evaluate the dataframe by row, and then append a new column that contains all items in that row as a list. I think that a list may be the best container since this means the values can be of any type.

library(tidyverse)
col1 <- c("text", NA, "text", NA)
col2 <- c(NA, "text", "text", NA)
col3 <- c("text", NA, "text", NA)
col4 <- c(17, 22, NA, NA)
col5 <- c(3, NA, 3, 17)
df <- data_frame(col1, col2, col3, col4, col5)

df %>% rowwise() %>% mutate(list_col=list(across())) -> out1

This solution works. I appends a new column (list_col) that now contains all values in a list. My next steps would be to use unlist (wihtout names), remove all NA values, and then combine everything back into a neat data frame.

However, to make my life easier I was also trying to remove all NA values as I am running the mutate function. I tried to use select for a while until I realized that select only computes on column names and not contents.

So here is what I got to.

library(tidyverse)
col1 <- c("text", NA, "text", NA)
col2 <- c(NA, "text", "text", NA)
col3 <- c("text", NA, "text", NA)
col4 <- c(17, 22, NA, NA)
col5 <- c(3, NA, 3, 17)
df <- data_frame(col1, col2, col3, col4, col5)

df %>% rowwise() %>% mutate(list_col=list(!is.na(across()))) -> out2

Unfortunately, the line is running all the evaluations correctly, but it is then putting TRUE/FALSE values into the list according to !is.na() evaluation. My question is: How to I change the line so that dplyr now uses this evaluation to actually select the corresponding entries of the data frame?

Output:

> print(out2$list_col)
[[1]]
     col1  col2 col3 col4 col5
[1,] TRUE FALSE TRUE TRUE TRUE

[[2]]
      col1 col2  col3 col4  col5
[1,] FALSE TRUE FALSE TRUE FALSE

[[3]]
     col1 col2 col3  col4 col5
[1,] TRUE TRUE TRUE FALSE TRUE

[[4]]
      col1  col2  col3  col4 col5
[1,] FALSE FALSE FALSE FALSE TRUE

Desired output:

[[1]]
     col1   col3   col4 col5
[1,] "text" "text"  17    3

[[2]]
      col2  col4  
[1,]  "text"  22 

[[3]]
     col1    col2   col3  col5
[1,] "text" "text" "text" 3

[[4]]
      col5
[1,]  17

Upvotes: 1

Views: 59

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388807

In base R you can use apply with na.omit :

df$list_col <- apply(df, 1, na.omit)
df
#  col1  col2  col3   col4  col5 list_col 
#  <chr> <chr> <chr> <dbl> <dbl> <list>   
#1 text  NA    text     17     3 <chr [4]>
#2 NA    text  NA       22    NA <chr [2]>
#3 text  text  text     NA     3 <chr [4]>
#4 NA    NA    NA       NA    17 <chr [1]>

Upvotes: 0

akrun
akrun

Reputation: 886938

We can use pmap

library(dplyr)
library(purrr)
df1 <- df %>%
          mutate(list_col = pmap(., ~ c(na.omit(c(...)))))

-output

df1
# A tibble: 4 x 6
#  col1  col2  col3   col4  col5 list_col 
#  <chr> <chr> <chr> <dbl> <dbl> <list>   
#1 text  <NA>  text     17     3 <chr [4]>
#2 <NA>  text  <NA>     22    NA <chr [2]>
#3 text  text  text     NA     3 <chr [4]>
#4 <NA>  <NA>  <NA>     NA    17 <chr [1]>

Or if we want to use across with is.na, make sure we subset the values with the logical index from is.na

 df %>% 
     mutate(across(everything(), as.character)) %>%
     rowwise() %>% 
     mutate(list_col=list(across()[!is.na(across())]))
# A tibble: 4 x 6
# Rowwise: 
#  col1  col2  col3  col4  col5  list_col 
#  <chr> <chr> <chr> <chr> <chr> <list>   
#1 text  <NA>  text  17    3     <chr [4]>
#2 <NA>  text  <NA>  22    <NA>  <chr [2]>
#3 text  text  text  <NA>  3     <chr [4]>
#4 <NA>  <NA>  <NA>  <NA>  17    <chr [1]>

Or another option is base R

df$list_col <- apply(df, 1, FUN = function(x) x[complete.cases(x)])
 

Upvotes: 1

Related Questions