Remove first N lines on a character column in a data frame

I have a data frame containing emails. There is a column named "message" that looks like this:

> > dataset$message[1]  
>[1] Message-ID:...
> 
> Date: ...
> 
> From: ...
> 
> To:...
> 
> Subject: ...
> 
> Mime-Version: ...
> 
> Content-Type:...
> 
> Content-Transfer-Encoding: ...
> 
> X-From:...
> 
> X-To: ...
> 
> X-cc:...
> 
> X-bcc: ...
> 
> X-Folder: ...
> 
> X-Origin: ...
> 
> X-FileName: ...
>  
> > Some message text

In other words, each entry contains 15 lines of headers and then the text. What I want is to remove these 15 lines from each row and be left only with the text, so that

>dataset$message[1]

looks like this:

> Some message text

Upvotes: 0

Views: 148

Answers (1)

Andre Elrico
Andre Elrico

Reputation: 11480

Something like this would work:

sub("^(?:.*\\n){15}", "", multiline_string_mail, perl = TRUE)

#[1] "Super secret message"

example data: (you should always provide usable example data)

multiline_string_mail =
"hehe
hehe
hehe
hehe
hehe
hehe
hehe
hehe
hehe
hehe
hehe
hehe
hehe
hehe
hehe
Super secret message"

Upvotes: 1

Related Questions