Reputation: 1165
I am trying to clean up an irregular dataframe using dplyr functions. My intent here is to evaluate the dataframe by row, and then append a new column that contains all items in that row as a list. I think that a list may be the best container since this means the values can be of any type.
library(tidyverse)
col1 <- c("text", NA, "text", NA)
col2 <- c(NA, "text", "text", NA)
col3 <- c("text", NA, "text", NA)
col4 <- c(17, 22, NA, NA)
col5 <- c(3, NA, 3, 17)
df <- data_frame(col1, col2, col3, col4, col5)
df %>% rowwise() %>% mutate(list_col=list(across())) -> out1
This solution works. I appends a new column (list_col) that now contains all values in a list. My next steps would be to use unlist (wihtout names), remove all NA values, and then combine everything back into a neat data frame.
However, to make my life easier I was also trying to remove all NA values as I am running the mutate
function. I tried to use select
for a while until I realized that select
only computes on column names and not contents.
So here is what I got to.
library(tidyverse)
col1 <- c("text", NA, "text", NA)
col2 <- c(NA, "text", "text", NA)
col3 <- c("text", NA, "text", NA)
col4 <- c(17, 22, NA, NA)
col5 <- c(3, NA, 3, 17)
df <- data_frame(col1, col2, col3, col4, col5)
df %>% rowwise() %>% mutate(list_col=list(!is.na(across()))) -> out2
Unfortunately, the line is running all the evaluations correctly, but it is then putting TRUE/FALSE values into the list according to !is.na()
evaluation. My question is: How to I change the line so that dplyr now uses this evaluation to actually select the corresponding entries of the data frame?
Output:
> print(out2$list_col)
[[1]]
col1 col2 col3 col4 col5
[1,] TRUE FALSE TRUE TRUE TRUE
[[2]]
col1 col2 col3 col4 col5
[1,] FALSE TRUE FALSE TRUE FALSE
[[3]]
col1 col2 col3 col4 col5
[1,] TRUE TRUE TRUE FALSE TRUE
[[4]]
col1 col2 col3 col4 col5
[1,] FALSE FALSE FALSE FALSE TRUE
Desired output:
[[1]]
col1 col3 col4 col5
[1,] "text" "text" 17 3
[[2]]
col2 col4
[1,] "text" 22
[[3]]
col1 col2 col3 col5
[1,] "text" "text" "text" 3
[[4]]
col5
[1,] 17
Upvotes: 1
Views: 59
Reputation: 388807
In base R you can use apply
with na.omit
:
df$list_col <- apply(df, 1, na.omit)
df
# col1 col2 col3 col4 col5 list_col
# <chr> <chr> <chr> <dbl> <dbl> <list>
#1 text NA text 17 3 <chr [4]>
#2 NA text NA 22 NA <chr [2]>
#3 text text text NA 3 <chr [4]>
#4 NA NA NA NA 17 <chr [1]>
Upvotes: 0
Reputation: 886938
We can use pmap
library(dplyr)
library(purrr)
df1 <- df %>%
mutate(list_col = pmap(., ~ c(na.omit(c(...)))))
-output
df1
# A tibble: 4 x 6
# col1 col2 col3 col4 col5 list_col
# <chr> <chr> <chr> <dbl> <dbl> <list>
#1 text <NA> text 17 3 <chr [4]>
#2 <NA> text <NA> 22 NA <chr [2]>
#3 text text text NA 3 <chr [4]>
#4 <NA> <NA> <NA> NA 17 <chr [1]>
Or if we want to use across
with is.na
, make sure we subset the values with the logical index from is.na
df %>%
mutate(across(everything(), as.character)) %>%
rowwise() %>%
mutate(list_col=list(across()[!is.na(across())]))
# A tibble: 4 x 6
# Rowwise:
# col1 col2 col3 col4 col5 list_col
# <chr> <chr> <chr> <chr> <chr> <list>
#1 text <NA> text 17 3 <chr [4]>
#2 <NA> text <NA> 22 <NA> <chr [2]>
#3 text text text <NA> 3 <chr [4]>
#4 <NA> <NA> <NA> <NA> 17 <chr [1]>
Or another option is base R
df$list_col <- apply(df, 1, FUN = function(x) x[complete.cases(x)])
Upvotes: 1