Getting the second item after using str_split() in R

Question

I have a data frame that contains some questions. I want to drop the leading number and period from the question, but leave everything else. I don't really understand regex, but this seems like a perfect use for str_split(), specifically within a dplyr pipeline. However, after splitting the string, I'm not sure how to grab the the second item. I tried accessing it by position, and that didn't work.

x <- structure(list(question = c("01. I like my job.", 
                                 "02. I like my house.", 
                                 "03. I like my car.")), class = "data.frame", row.names = c(NA, -3L))

x %>% 
  mutate(words = str_split(question, "."))

Returns this:

question                        words
01. I like my job.                    
02. I like my house.                  
03. I like my car.

I want it to look like this:

question                             words
01. I like my job.         I like my job.           
02. I like my house.       I like my house.     
03. I like my car.         I like my car.

I've also tried using separate() and strsplit() but I couldn't make any of those work either.

Maurits Evers · Accepted Answer

I think you're looking for str_replace (or sub in base R)

x %>% mutate(words = str_replace(question, "^\d+\.", ""))
#              question             words
#1   01. I like my job.    I like my job.
#2 02. I like my house.  I like my house.
#3   03. I like my car.    I like my car.

Explanation:

^ is the left string anchor
\d+\. matches one or more digit(s) followed by a full stop

You can use str_split in the following way

x %>% mutate(words = paste0(map_chr(str_split(question, "\."), 2), "."))

giving the same result.

Getting the second item after using str_split() in R

Answers (2)

Related Questions