Reputation: 55
I'm trying to extract strings having same patterns from the text
The Tragedy of Romeo and Juliet by William Shakespeare
library(readr)
txt <- read_file('http://www.gutenberg.org/cache/epub/1112/pg1112.txt')
Text example:
Scene I.\r\nVerona. A public place.\r\n\r\nEnter Sampson and Gregory (with swords and bucklers) of the house\r\nof Capulet.
...
Scene II.\r\nA Street.\r\n\r\nEnter Capulet, County Paris, and [Servant] -the Clown.\r\n\r\n\r\n Cap.
I want to extract
Verona. A public place.
A Street
I tried with
library(stringr)
str_extract(txt, "Scene\\s[IV]+\\.\\s\\s\\b[A-Z]+\\b")
It didn't work.
Thank you in advance for your advice.
Upvotes: 2
Views: 267
Reputation: 79348
str_extract_all(gsub("(Scene.*?)\r\n","\\1 ",txt),"Scene.*")
[[1]]
[1] "Scene I. Verona. A public place."
[2] "Scene II. A Street."
[3] "Scene III. Capulet's house."
[4] "Scene IV. A street."
[5] "Scene V. Capulet's house."
[6] "Scene I. A lane by the wall of Capulet's orchard."
[7] "Scene II. Capulet's orchard."
[8] "Scene III. Friar Laurence's cell."
[9] "Scene IV. A street."
[10] "Scene V. Capulet's orchard."
[11] "Scene VI. Friar Laurence's cell."
[12] "Scene I. A public place."
[13] "Scene II. Capulet's orchard."
[14] "Scene III. Friar Laurence's cell."
[15] "Scene IV. Capulet's house"
[16] "Scene V. Capulet's orchard."
[17] "Scene I. Friar Laurence's cell."
[18] "Scene II. Capulet's house."
[19] "Scene III. Juliet's chamber."
[20] "Scene IV. Capulet's house."
[21] "Scene V. Juliet's chamber."
[22] "Scene I. Mantua. A street."
[23] "Scene II. Verona. Friar Laurence's cell."
[24] "Scene III. Verona. A churchyard; in it the monument of the Capulets."
Upvotes: 2