Kim Young Han
Kim Young Han

Reputation: 55

str_extract specific patterns

I'm trying to extract strings having same patterns from the text

The Tragedy of Romeo and Juliet by William Shakespeare

library(readr)

txt <- read_file('http://www.gutenberg.org/cache/epub/1112/pg1112.txt')

Text example:

Scene I.\r\nVerona. A public place.\r\n\r\nEnter Sampson and Gregory (with swords and bucklers) of the house\r\nof Capulet.
...
Scene II.\r\nA Street.\r\n\r\nEnter Capulet, County Paris, and [Servant] -the Clown.\r\n\r\n\r\n Cap.

I want to extract

Verona. A public place.
A Street

I tried with

library(stringr)

str_extract(txt, "Scene\\s[IV]+\\.\\s\\s\\b[A-Z]+\\b")

It didn't work.

Thank you in advance for your advice.

Upvotes: 2

Views: 267

Answers (1)

Onyambu
Onyambu

Reputation: 79348

str_extract_all(gsub("(Scene.*?)\r\n","\\1 ",txt),"Scene.*")
[[1]]
 [1] "Scene I. Verona. A public place."                                    
 [2] "Scene II. A Street."                                                 
 [3] "Scene III. Capulet's house."                                         
 [4] "Scene IV. A street."                                                 
 [5] "Scene V. Capulet's house."                                           
 [6] "Scene I. A lane by the wall of Capulet's orchard."                   
 [7] "Scene II. Capulet's orchard."                                        
 [8] "Scene III. Friar Laurence's cell."                                   
 [9] "Scene IV. A street."                                                 
[10] "Scene V. Capulet's orchard."                                         
[11] "Scene VI. Friar Laurence's cell."                                    
[12] "Scene I. A public place."                                            
[13] "Scene II. Capulet's orchard."                                        
[14] "Scene III. Friar Laurence's cell."                                   
[15] "Scene IV. Capulet's house"                                           
[16] "Scene V. Capulet's orchard."                                         
[17] "Scene I. Friar Laurence's cell."                                     
[18] "Scene II. Capulet's house."                                          
[19] "Scene III. Juliet's chamber."                                        
[20] "Scene IV. Capulet's house."                                          
[21] "Scene V. Juliet's chamber."                                          
[22] "Scene I. Mantua. A street."                                          
[23] "Scene II. Verona. Friar Laurence's cell."                            
[24] "Scene III. Verona. A churchyard; in it the monument of the Capulets."

Upvotes: 2

Related Questions