EML
EML

Reputation: 671

Combine multiple columns into vector by row with dplyr

I am trying to combine multiple columns into a single cell for each row and then remove missing values.

Sample data:

df <- data.frame(a=c("a", "b", "c", "d"),
                 b=c(NA, "a", "b", "c"),
                 c=c("a", "b", "e", "g"))

Attempt:

df %>% rowwise() %>%
mutate(collapse=as.character(paste(a,b,c, collapse=",")),
       collapse_nona=na.omit(collapse))

Output:

# A tibble: 4 x 5
  a     b     c     collapse                collapse_nona         
* <fct> <fct> <fct> <chr>                   <chr>                 
1 a     NA    a     a NA a,b a b,c b e,d c… a NA a,b a b,c b e,d …
2 b     a     b     a NA a,b a b,c b e,d c… a NA a,b a b,c b e,d …
3 c     b     e     a NA a,b a b,c b e,d c… a NA a,b a b,c b e,d …
4 d     c     g     a NA a,b a b,c b e,d c… a NA a,b a b,c b e,d …

1) I am not successfully creating cells with values for each row (the whole column appears in collapse).

2) Cells in the collapse column do not behave like a vector.

Desired output

  a     b     c     collapse                collapse_nona         
* <fct> <fct> <fct> <chr>                   <chr>                 
1 a     NA    a     a NA a                  a a
2 b     a     b     b a b                   b a b
3 c     b     e     c b e                   c b e
4 d     c     g     d c g                   d c g

Thank you

Upvotes: 2

Views: 2250

Answers (3)

hammoire
hammoire

Reputation: 361

I think this does it. You could play around with the sep argument in str_c.

library(dplyr)
library(stringr)
df %>% 
  mutate(collapse = str_c(str_replace_na(a), str_replace_na(b), str_replace_na(c), sep = " "),
         collapse_nona = str_c(str_replace_na(a, ""), str_replace_na(b, ""), str_replace_na(c,""), sep = " "))

  a    b c collapse collapse_nona
1 a <NA> a   a NA a          a  a
2 b    a b    b a b         b a b
3 c    b e    c b e         c b e
4 d    c g    d c g         d c g

Upvotes: 0

astrofunkswag
astrofunkswag

Reputation: 2698

The think the core issue is that you don't want collapse, you want sep. Then rowwise calculation is unnecessary. Also, NA will get printed as character, so you cannot remove them with na.omit

df %>% 
   mutate(collapse = paste(a,b,c, sep = " "), collapse_nona = gsub("NA", "", collapse))

  a    b c collapse collapse_nona
1 a <NA> a   a NA a          a  a
2 b    a b    b a b         b a b
3 c    b e    c b e         c b e
4 d    c g    d c g         d c g

Upvotes: 2

akrun
akrun

Reputation: 887118

With unite, there is an option for na.rm and it is by default FALSE

library(tidyr)
library(dplyr)
df %>%
   mutate_all(as.character) %>%
   unite(collapse, a, b,c,  remove = FALSE, sep=" ") %>%
   unite(collapse_nona, a, b, c, remove = FALSE, sep=" ", na.rm = TRUE) %>%
   select(names(df), everything())
#   a    b c collapse collapse_nona
#1 a <NA> a   a NA a           a a
#2 b    a b    b a b         b a b
#3 c    b e    c b e         c b e
#4 d    c g    d c g         d c g

Or with paste and str_remove_all (from stringr) - Note that paste/str_c are vectorized, so there is no need to loop over each row with rowwise

df %>%
     mutate(collapse = paste(a, b, c), 
            collapse_nona = str_remove_all(collapse,  "\\sNA|NA\\s"))
#  a    b c collapse collapse_nona
#1 a <NA> a   a NA a           a a
#2 b    a b    b a b         b a b
#3 c    b e    c b e         c b e
#4 d    c g    d c g         d c g

Another option is pmap to loop over each row, remove the NA elements with na.omit and then paste or str_c (from stringr)

library(dplyr)
library(stringr)
library(purrr)
df %>%
     mutate_all(as.character) %>% 
     mutate(collapse_nona = pmap_chr(., ~ c(...) %>%
                na.omit %>%
                str_c(collapse=" "))) 
#  a    b c collapse_nona
#1 a <NA> a           a a
#2 b    a b         b a b
#3 c    b e         c b e
#4 d    c g         d c g

Upvotes: 4

Related Questions