nsmd
nsmd

Reputation: 1

Issue with na.rm = TRUE when combining multiple character columns using Unite from dplyr

When trying to combine multiple character columns using unite from dplyr, the na.rm = TRUE option does not remove NA.

Step by step:

  1. Original dataset has 5 columns word1:word5 Image of the original data
  2. Looking to combine word1:word5 in a single column using code:
    data_unite_5 <-  data_original_5 %>%
        unite("pentawords", word1:word5, sep=" ", na.rm=TRUE, remove=FALSE)
  1. I've tried using mutate_if(is.factor, as.character) but that did not work.

Any suggestions would be appreciated.

Upvotes: 0

Views: 269

Answers (1)

Simon.S.A.
Simon.S.A.

Reputation: 6931

You have misinterpreted how the na.rm argument works for unite. Following the examples on the tidyverse page here, z is the unite of x and y.

With na.rm = FALSE

#>   z     x     y    
#>   <chr> <chr> <chr>
#> 1 a_b   a     b    
#> 2 a_NA  a     NA   
#> 3 NA_b  NA    b    
#> 4 NA_NA NA    NA   

With na.rm = TRUE

#>   z     x     y    
#>   <chr> <chr> <chr>
#> 1 "a_b" a     b    
#> 2 "a"   a     NA   
#> 3 "b"   NA    b    
#> 4 ""    NA    NA  

Hence na.rm determines how NA values appear in the assembled strings (pentrawords) it does not drop rows from the data.

If you were wanting to remove the fourth row of the dataset, I would recommend filter.

data_unite_5 <- data_original_5 %>%
  unite("pentawords", word1:word5, sep =" " , na.rm = TRUE, remove = FALSE) %>%
  filter(pentawords != "")

Which will exclude from your output all empty strings.

Upvotes: 0

Related Questions