Reputation: 121
I have text that looks like:
txt <- Name, Name. Title. Pub. Year; Details.
I want to extract only Pub.
I can extract year and details using:
gsub(".*\\.(.*)\\..*", "\\1", txt)
How can extract everything between the third to last and second to last period (just Pub) in R?
Upvotes: 4
Views: 152
Reputation: 627100
You may use a sub
(since you need to perform a single search and replace operation) the following way:
txt <-"Name, Name. Title. Pub. Year; Details."
sub(".*\\.([^.]*)(?:\\.[^.]*){2}$", "\\1", txt)
# => [1] " Pub"
See the R demo.
Details
.*
- any 0+ chars, as many as possible\\.
- a .
([^.]*)
- Group 1: any 0+ chars other than .
(?:\\.[^.]*){2}
- 2 consecutive sequences of
\\.
- a .
[^.]*
- any 0+ chars other than .
$
- end of string.Upvotes: 4