user459
user459

Reputation: 111

Read lines between two specific words in a text document

I have data of the form:

Trifle  
Beef gyoza with black vinegar dipping sauce  
8 Comments

And I want to extract the line between Trifle and 8 Comments. Here it can be any number.

Upvotes: 1

Views: 85

Answers (2)

Matthew Plourde
Matthew Plourde

Reputation: 44614

Another option, using trimws and the (?s) regex flag, which includes newlines in the set of characters covered by the dot:

pat <- "(?s)^.*Trifle(.+)8 Comments.*$"
trimws(gsub(pat, '\\1', x, perl=TRUE))
# [1] "Beef gyoza with black vinegar dipping sauce"

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174716

Use regmatches

regmatches(x, gregexpr("\\bTrifle\\b.*\\n+\\K.*(?=\\n+.*8 Comments\\b)", x, perl=TRUE))

DEMO

For general cases.

regmatches(x, gregexpr("\\bTrifle\\b.*\\n+\\K.*(?=\\n+.*\\b\\d+\\h+Comments\\b)", x, perl=TRUE))

Upvotes: 2

Related Questions