socialscientist
socialscientist

Reputation: 4232

tidyr::separate() producing unexpected results

I am providing a data frame to tidyr::separate() and getting unexpected results. I have a minimal working example below where I show how I am using it, what I expect it to produce, and what it is actually producing. Why is this not working?

# Create toy data frame
dat <- data.frame(text = c("time_suffer|suffer_employ|suffer_sick"), 
        stringsAsFactors = FALSE)

# Separate variable into 3 columns a,b,c using | as a delimiter
dat %>% tidyr::separate(., col = "text", into = c("a","b","c"), sep = "|")

# What I'm expecting
data.frame(a = "time_suffer", b = "suffer_employ", c = "suffer_sick")

# What I'm actually getting:
data.frame(a = NA, b = "t", c = "1")

I am also getting the warning "Warning message: Expected 3 pieces. Additional pieces discarded in 1 rows [1]."

Upvotes: 1

Views: 379

Answers (1)

Calum You
Calum You

Reputation: 15062

According to the documentation, the sep argument to separate is interpreted as a regular expression if it is a character (extremely useful if you have complicated separators). This does mean, however, that you need to escape characters with special meaning in regular expressions if you want to match on them literally. Use "\\|" as your separator:

library(tidyverse)
dat <- data.frame(text = c("time_suffer|suffer_employ|suffer_sick"), 
                  stringsAsFactors = FALSE)

dat %>%
  tidyr::separate(., col = "text", into = c("a","b","c"), sep = "\\|")
#>             a             b           c
#> 1 time_suffer suffer_employ suffer_sick

Created on 2019-04-02 by the reprex package (v0.2.1)

Upvotes: 4

Related Questions