RAS
RAS

Reputation: 121

Extract all text between third to last and last period

I have text that looks like:

txt <- Name, Name. Title. Pub. Year; Details.

I want to extract only Pub.

I can extract year and details using:

gsub(".*\\.(.*)\\..*", "\\1", txt)

How can extract everything between the third to last and second to last period (just Pub) in R?

Upvotes: 4

Views: 152

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627100

You may use a sub (since you need to perform a single search and replace operation) the following way:

txt <-"Name, Name. Title. Pub. Year; Details."
sub(".*\\.([^.]*)(?:\\.[^.]*){2}$", "\\1", txt)
# => [1] " Pub"

See the R demo.

Details

  • .* - any 0+ chars, as many as possible
  • \\. - a .
  • ([^.]*) - Group 1: any 0+ chars other than .
  • (?:\\.[^.]*){2} - 2 consecutive sequences of
    • \\. - a .
    • [^.]* - any 0+ chars other than .
  • $ - end of string.

Upvotes: 4

Related Questions