aciobanu
aciobanu

Reputation: 389

Regexp matching all the quoted strings

I am trying to do a regexp, if possible, that would match all the quoted strings from a text. An example:

ABC released its full midseason schedule today, and it features premiere dates for several new shows, along with one rather surprising timeslot change.</p><p>First of all, ABC's previously reported plans for dramas 'Once Upon A Time,' 'Revenge,' 'Grey's Anatomy,' and 'Scandal' haven't changed.

I would like to have as a result:

's previously reported plans for dramas ' (not useful but i can manage it)
'Once Upon A Time,'
' '
'Revenge,'
' 'Grey'
'Grey's Anatomy,'
etc

So i would basicly need to match twice each quote. If i use a standard regexp i would not have 'Once Upon A Time,' and 'Grey's Anatomy,', for obvious reasons.

Thanks for any suggestions!

Upvotes: 0

Views: 78

Answers (1)

DavidRR
DavidRR

Reputation: 19397

Here's a solution in Perl that works for the given example. See the live demo.

#!/usr/bin/perl -w

use strict;
use warnings;

while (<DATA>) {

#   \1/ Starting at the beginning of a string or non-word character,
#   \2/ MATCH a single-quote character followed by a character that is
#       *not* a single quote character,
#   \3/ And continue matching one or more times:
#       - a white space character,
#       - a word character,
#       - a comma,
#       - or a single-quote that is followed by a lower-case 's' or 't'.
#   \4/ And END the match on a single quote.
#   \5/ Continue searching for additional matches.

    my @matches = /(?:\A|\W)('[^'](?:\w|\s|,|'(?=[st]\b))+')/g;

#                  \___1___/\__2_/\___________3__________/4/\5/

    print join("\n", @matches), "\n";
}

__END__
 'At the Beginning' ABC released its full midseason schedule today, and it features premiere dates for several new shows, along with one rather surprising timeslot change.</p><p>First of all, ABC's previously reported plans for dramas 'Once Upon A Time,' 'Revenge,' 'Grey's Anatomy,' and 'Scandal' haven't changed.

Expected output:

'At the Beginning'
'Once Upon A Time,'
'Revenge,'
'Grey's Anatomy,'
'Scandal'

Upvotes: 2

Related Questions