Reputation: 69
I want to extract the text between the word "CHAIRMAN" and the last period "." in the text, including "CHAIRMAN" and ".". I have the following character vector: "CHAIRMAN massive amount of text."
"CHAIRMAN" and "." are mentioned several times in the text, and I only want to extract the text between the first time "CHAIRMAN" is used and the last time the period "." is used. I want to use regular expressions.
Thanks.
Upvotes: 1
Views: 52
Reputation: 18681
.+
and .*
both match greedily, so you can just do the following:
string = "The CHAIRMAN massive amount of text. CHAIRMAN massive amount of text. This is just a place holder"
stringr::str_extract(string, "CHAIRMAN.+\\.")
# [1] "CHAIRMAN massive amount of text. CHAIRMAN massive amount of text."
Upvotes: 2
Reputation: 37641
You can do that with sub
TEXT = "CHAIRMAN massive amount of text."
sub(".*?(CHAIRMAN.*\\.).*", "\\1", TEXT)
[1] "CHAIRMAN massive amount of text."
Upvotes: 2