Reputation: 1
I have a line of text such as
This "is" a test "of very interesting" problems "that can" be solved
And I'm trying to split it so that my array @goodtext would contain however many strings there are from quoted sections. So my array would contain the following:
$goodtext[0] is
$goodtext[1] of very interesting
$goodtext[2] that can
The number of quoted sections in each line varies, unfortunately...
Upvotes: 0
Views: 194
Reputation: 66881
Presuming that there can be no sensible nesting
my @quoted = $string =~ /"([^"]+)"/g;
or, if you need to be able to do some processing while collecting them
my @quoted;
while ($string =~ /"([^"]+)"/g) { #" (stop faulty markup highlight)
# ...
push @quoted, $1;
}
Note that we need the closing "
, even though [^"]+
will match up to it anyway. This is so that the engine consumes it and gets past it, so the next match of "
is indeed the next opening one.
If the quotations "can be "nested" as well" then you'd want Text::Balanced
As an aside, note the difference in behavior of the /g
modifier in list and scalar contexts.
In the list context, imposed by the list assignment (to @quoted
in the first example), with the /g
modifier the match operator returns a list of all captures, or of all matches if there is no capturing in the pattern (no parens)
In the scalar context, when evaluated as the while
condition (for example), its behavior with /g
is more complex. After a match, the next time the regex runs it continues searching the string from the position of (one after) the previous match, thus iterating through matches.
Note that we don't need a loop for this (what is a subtle cause for subtle bugs)
my $string = q(one simple string);
$string =~ /(\w+)/g;
say $1; #--> one
$string =~ /(\w+)g;
say $1; #--> simple
Without /g
in either regex we don't get this behavior, but rather one
is printed both times.
See Global matching in perlretut, and for instance \G
assertion (in perlop)
and pos
Upvotes: 4
Reputation: 8711
Try this.
$ a='This "is" a test "of very interesting" problems "that can" be solved'
$ echo $a | perl -lne ' @arr=$_=~/"(.+?)"/g; print join("\n",@arr) '
is
of very interesting
that can
$
Upvotes: 1
Reputation: 52409
Example of using Text::Balanced to extract the quoted substrings:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw/say/;
use Text::Balanced qw/extract_multiple extract_delimited/;
my $test = q{This "is" a test "of very interesting" problems "that can" be solved};
sub just_quotes {
extract_multiple $_[0], [ sub { extract_delimited $_[0], '"' } ], undef, 1;
}
say for just_quotes $test;
this will produce:
"is"
"of very interesting"
"that can"
Upvotes: 3