Lukaaaaaaaay
Lukaaaaaaaay

Reputation: 141

Matching multiline string in file using perl regex

I am reading in another perl file and trying to find all strings surrounded by quotations within the file, single or multiline. I've matched all the single lines fine but I can't match the mulitlines without printing the entire line out, when I just want the string itself. For example, heres a snippet of what I'm reading in:

#!/usr/bin/env perl
use warnings;
use strict;

# assign variable

my $string = 'Hello World!';
my $string4 = "chmod";
my $string3 = "This is a fun 
    multiple line string, please match";

so the output I'd like is

'Hello World!';
"chmod";
"This is a fun multiple line string, please match";

but I am getting:

'Hello World!';
my $string4 = "chmod";
my $string3 = "This is a fun 
    multiple line string, please match";

This is the code I am using to find the strings - all file content is stored in @contents:

my @strings_found = ();
my $line; 
for(@contents) {
    $line .= $_;
}

if($line =~ /(['"](.?)*["'])/s) {
    push @strings_found,$1;
}

print @strings_found;

I am guessing I am only getting 'Hello World!'; correctly because I am using the $1 but I am not sure how else to find the others without looping line by line, which I would think would make it hard to find the multi line string as it doesn't know what the next line is.

I know my regex is reasonably basic and doesn't account for some caveats but I just wanted to get the basic catch most regex working before moving on to more complex situations.

Any pointers as to where I am going wrong?

Upvotes: 1

Views: 15106

Answers (2)

Miller
Miller

Reputation: 35208

Couple big things, you need to search in a while loop with the g modifier on your regex. And you also need to turn off greedy matching for what's inside the quotes by using .*?.

use strict;
use warnings;

my $contents = do {local $/; <DATA>};

my @strings_found = ();

while ($contents =~ /(['"](.*?)["'])/sg) {
    push @strings_found, $1;
}

print "$_\n" for @strings_found;

__DATA__
#!/usr/bin/env perl
use warnings;
use strict;

# assign variable

my $string = 'Hello World!';
my $string4 = "chmod";
my $string3 = "This is a fun 
    multiple line string, please match";

Outputs

'Hello World!'
"chmod"
"This is a fun
    multiple line string, please match"

You aren't the first person to search for help with this homework problem. Here's a more detailed answer I gave to ... well ... you ;) finding words surround by quotations perl

Upvotes: 5

gaoithe
gaoithe

Reputation: 4398

regexp matching (in perl and generally) are greedy by default. So your regexp will match from 1st ' or " to last. Print the length of your @strings_found array. I think it will always be just 1 with the code you have.

Change it to be not greedy by following * with a ? /('"*?["'])/s I think.

It will work in a basic way. Regexps are kindof the wrong way to do this if you want a robust solution. You would want to write parsing code instead for that. If you have different quotes inside a string then greedy will give you the 1 biggest string. Non greedy will give you the smallest strings not caring if start or end quote are different.

Read about greedy and non greedy. Also note the /m multiline modifier. http://perldoc.perl.org/perlre.html#Regular-Expressions

Upvotes: 1

Related Questions