Reputation: 2417
I have a data frame that contains some questions. I want to drop the leading number and period from the question, but leave everything else. I don't really understand regex, but this seems like a perfect use for str_split()
, specifically within a dplyr
pipeline. However, after splitting the string, I'm not sure how to grab the the second item. I tried accessing it by position, and that didn't work.
x <- structure(list(question = c("01. I like my job.",
"02. I like my house.",
"03. I like my car.")), class = "data.frame", row.names = c(NA, -3L))
x %>%
mutate(words = str_split(question, "."))
Returns this:
question words
01. I like my job. <chr [19]>
02. I like my house. <chr [21]>
03. I like my car. <chr [19]>
I want it to look like this:
question words
01. I like my job. I like my job.
02. I like my house. I like my house.
03. I like my car. I like my car.
I've also tried using separate()
and strsplit()
but I couldn't make any of those work either.
Upvotes: 4
Views: 5286
Reputation: 39154
You can change the pattern to be \\.
, and then get the second element for the word
column.
library(tidyverse)
x %>%
mutate(words = str_split(question, "\\. ")[[1]][[2]])
# question words
# 1 01. I like my job. I like my job.
# 2 02. I like my house. I like my job.
# 3 03. I like my car. I like my job.
Upvotes: 2
Reputation: 50668
I think you're looking for str_replace
(or sub
in base R)
x %>% mutate(words = str_replace(question, "^\\d+\\.", ""))
# question words
#1 01. I like my job. I like my job.
#2 02. I like my house. I like my house.
#3 03. I like my car. I like my car.
Explanation:
^
is the left string anchor\\d+\\.
matches one or more digit(s) followed by a full stopYou can use str_split
in the following way
x %>% mutate(words = paste0(map_chr(str_split(question, "\\."), 2), "."))
giving the same result.
Upvotes: 6