Robert Dodd
Robert Dodd

Reputation: 1

Perl split and regex query

I have a line of text such as

This "is" a test "of very interesting" problems "that can" be solved

And I'm trying to split it so that my array @goodtext would contain however many strings there are from quoted sections. So my array would contain the following:

$goodtext[0] is
$goodtext[1] of very interesting
$goodtext[2] that can

The number of quoted sections in each line varies, unfortunately...

Upvotes: 0

Views: 194

Answers (3)

zdim
zdim

Reputation: 66881

Presuming that there can be no sensible nesting

my @quoted = $string =~ /"([^"]+)"/g;

or, if you need to be able to do some processing while collecting them

my @quoted;    
while ($string =~ /"([^"]+)"/g) {      #" (stop faulty markup highlight)
    # ...
    push @quoted, $1;
}

Note that we need the closing ", even though [^"]+ will match up to it anyway. This is so that the engine consumes it and gets past it, so the next match of " is indeed the next opening one.

If the quotations "can be "nested" as well" then you'd want Text::Balanced


As an aside, note the difference in behavior of the /g modifier in list and scalar contexts.

  • In the list context, imposed by the list assignment (to @quoted in the first example), with the /g modifier the match operator returns a list of all captures, or of all matches if there is no capturing in the pattern (no parens)

  • In the scalar context, when evaluated as the while condition (for example), its behavior with /g is more complex. After a match, the next time the regex runs it continues searching the string from the position of (one after) the previous match, thus iterating through matches.

    Note that we don't need a loop for this (what is a subtle cause for subtle bugs)

      my $string = q(one simple string);
    
      $string =~ /(\w+)/g; 
      say $1;               #--> one
    
      $string =~ /(\w+)g;
      say $1;               #--> simple
    

    Without /g in either regex we don't get this behavior, but rather one is printed both times.

See Global matching in perlretut, and for instance \G assertion (in perlop) and pos

Upvotes: 4

stack0114106
stack0114106

Reputation: 8711

Try this.

$ a='This "is" a test "of very interesting" problems "that can" be solved'

$ echo $a | perl -lne ' @arr=$_=~/"(.+?)"/g; print join("\n",@arr) '
is
of very interesting
that can

$

Upvotes: 1

Shawn
Shawn

Reputation: 52409

Example of using Text::Balanced to extract the quoted substrings:

#!/usr/bin/perl
use warnings;
use strict;
use feature qw/say/;
use Text::Balanced qw/extract_multiple extract_delimited/;

my $test = q{This "is" a test "of very interesting" problems "that can" be solved};

sub just_quotes {
  extract_multiple $_[0], [ sub { extract_delimited $_[0], '"' } ], undef, 1;
}

say for just_quotes $test;

this will produce:

"is"
"of very interesting"
"that can"

Upvotes: 3

Related Questions