arn
arn

Reputation: 1

r - How can I extract multiple lines of text between two symbols?

Sorry, I am new with R, so this might be a basic question.

Say I have a text file that looks like this:

START1 
<line1>
    <line2>
<line3>
END1

START2
<line4>
    <line5>
<line6>
END2

And I want to save two objects, TEXT1 and TEXT2, that look like this:

TEXT1:

<line1>
    <line2>
<line3>

TEXT 2:

<line4>
    <line5>
<line6>

So basically, I want a script that will select all the line between two symbols and preserve the formatting.

I tried using gsub like this:

TEXT1 <- gsub(".*START1 | END1.*", "", x)

but it seems like gsub will only work for a string, not for multiple lines and keeping the formatting.

Any ideas?

Upvotes: 0

Views: 870

Answers (1)

Andrew Lavers
Andrew Lavers

Reputation: 4378

Sinnce you say you have a file, perhaps read with something like text = readLines("myfile.txt"), then text will be a vector. Below code filters the lines that do not have start and end. This

text[!grepl("(START\\d+|\\s*END\\d+)", text)]

Upvotes: 2

Related Questions