Reputation: 7490
I'm trying to use regex to capture Shakespeare dialogue to practice using regex for text matching. For instance, I want to capture all the text spoken by a character called CALIBAN
in this particular scene:
PROSPERO. Thou most lying slave,
Whom stripes may move, not kindness! I have us'd thee,
Filth as thou art, with human care, and lodg'd thee
In mine own cell, till thou didst seek to violate
The honour of my child.
CALIBAN. O ho, O ho! Would't had been done.
Thou didst prevent me. I had peopl'd else
This isle with Calibans.
PROSPERO. Thou most lying slave,
Whom stripes may move, not kindness! I have us'd thee,
Filth as thou art, with human care, and lodg'd thee
In mine own cell, till thou didst seek to violate
The honour of my child.
CALIBAN. O ho, O ho! Would't had been done.
Thou didst prevent me. I had peopl'd else
This isle with Calibans.
I'd like to capture
O ho, O ho! Would't had been done.
Thou didst prevent me. I had peopl'd else
This isle with Calibans.
How would I use regex to accomplish this? I tried this particular regex:
(?<=\n CALIBAN\. )[A-Za-z ',\.\n\!-]+(?=\n PROSPERO\. |$)
Note: in the actual text, there's always 2 white space characters, and then the new character's name. Each line has a carriage return at the end of it.
My regex looks for CALIBAN.
to start, then matches some text, and ensures that it must end with PROSPERO.
. However, when I plug this into regexp.com, I have my entire text matched:
Upvotes: 0
Views: 148
Reputation: 785521
You may use this regex with lazy quantifier:
(?<=\n CALIBAN\. )[A-Za-z\s',.!-]+?(?=\n PROSPERO\. |$)
In PHP use:
$re = '/(?<=\n CALIBAN\. )[A-Za-z\s\',.!-]+?(?=\n PROSPERO\. |$)/';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the result
print_r($matches[0]);
Upvotes: 2
Reputation: 60
Try using the following regex:
CALIBAN. ((.*\n .*)*)
The first capture group (group 1) will match the text spoken by Caliban without including his name. Based upon the provided example, this regex should work.
Upvotes: 1