Reputation: 111
I have data of the form:
Trifle
Beef gyoza with black vinegar dipping sauce
8 Comments
And I want to extract the line between Trifle
and 8 Comments
. Here it can be any number.
Upvotes: 1
Views: 85
Reputation: 44614
Another option, using trimws
and the (?s)
regex flag, which includes newlines in the set of characters covered by the dot:
pat <- "(?s)^.*Trifle(.+)8 Comments.*$"
trimws(gsub(pat, '\\1', x, perl=TRUE))
# [1] "Beef gyoza with black vinegar dipping sauce"
Upvotes: 1
Reputation: 174716
Use regmatches
regmatches(x, gregexpr("\\bTrifle\\b.*\\n+\\K.*(?=\\n+.*8 Comments\\b)", x, perl=TRUE))
For general cases.
regmatches(x, gregexpr("\\bTrifle\\b.*\\n+\\K.*(?=\\n+.*\\b\\d+\\h+Comments\\b)", x, perl=TRUE))
Upvotes: 2