It_hurts_when_ip
It_hurts_when_ip

Reputation: 69

Extract text using regular expressions

I want to extract the text between the word "CHAIRMAN" and the last period "." in the text, including "CHAIRMAN" and ".". I have the following character vector: "CHAIRMAN massive amount of text."

"CHAIRMAN" and "." are mentioned several times in the text, and I only want to extract the text between the first time "CHAIRMAN" is used and the last time the period "." is used. I want to use regular expressions.

Thanks.

Upvotes: 1

Views: 52

Answers (2)

acylam
acylam

Reputation: 18681

.+ and .* both match greedily, so you can just do the following:

string = "The CHAIRMAN massive amount of text. CHAIRMAN massive amount of text. This is just a place holder"

stringr::str_extract(string, "CHAIRMAN.+\\.")

# [1] "CHAIRMAN massive amount of text. CHAIRMAN massive amount of text."

Upvotes: 2

G5W
G5W

Reputation: 37641

You can do that with sub

TEXT = "CHAIRMAN massive amount of text."
sub(".*?(CHAIRMAN.*\\.).*", "\\1", TEXT)
[1] "CHAIRMAN massive amount of text."

Upvotes: 2

Related Questions