Reputation: 42283
Is there a more efficient method? How can I do this without stringr
?
txt <- "I want to extract the words between this and that, this goes with that, this is a long way from that"
library(stringr)
w_start <- "this"
w_end <- "that"
pattern <- paste0(w_start, "(.*?)", w_end)
wordsbetween <- unlist(str_extract_all(txt, pattern))
gsub("^\\s+|\\s+$", "", str_sub(wordsbetween, nchar(w_start)+1, -nchar(w_end)-1))
[1] "and" "goes with" "is a long way from"
Upvotes: 7
Views: 2408
Reputation: 93813
Here's another rough attempt using strsplit
, though it can probably be refined further:
txtspl <- unlist(strsplit(gsub("[[:punct:]]","",txt),"this|that"))
txtspl[txtspl!=" "][-1]
#[1] " and " " goes with " " is a long way from "
Upvotes: 1
Reputation: 109874
This is an approach I use in qdap:
Using qdap:
library(qdap)
genXtract(txt, "this", "that")
## > genXtract(txt, "this", "that")
## this : that1 this : that2 this : that3
## " and " " goes with " " is a long way from "
Without an add on package:
regmatches(txt, gregexpr("(?<=this).*?(?=that)", txt, perl=TRUE))
## > regmatches(txt, gregexpr("(?<=this).*?(?=that)", txt, perl=TRUE))
## [[1]]
## [1] " and " " goes with " " is a long way from "
Upvotes: 12