cooldood3490
cooldood3490

Reputation: 2498

How to grep capture a multiline pattern of a file in Perl

I have a file that looks something like this:

Random words go here
/attribute1
/attribute2
/attribute3="all*the*things*I'm*interested*in*are*inside*here**
and*it*goes*into*the*next*line.*blah*blah*blah*foo*foo*foo*foo*
bar*bar*bar*bar*random*words*go*here*until*the*end*of*the*sente
nce.*I*think*we*have*enough*words"

I want to grep the file for the line \attribute3= then I want to save the string found inside the quotation marks to a separate variable.

Here's what I have so far:

#!/bin/perl
use warnings; use strict;
my $file = "data.txt";
open(my $fh, '<', $file) or die $!;
while (my $line = <$fh>) {
    if ($line =~ /\/attribute3=/g){
        print $line . "\n";
    }
}

That's printing out /attribute3="all*the*things*I'm*interested*in*are*inside*here** but

I want all*the*things*I'm*interested*in*are*inside*here**and*it*goes*into*the*next*line.*blah*blah*blah*foo*foo*foo*foo*bar*bar*bar*bar*random*words*go*here*until*the*end*of*the*sentence.*I*think*we*have*enough*words.

So what I did next is:

#!/bin/perl
use warnings; use strict;
my $file = "data.txt";
open(my $fh, '<', $file) or die $!;
my $part_I_want;
while (my $line = <$fh>) {
    if ($line =~ /\/attribute3=/g){
        $line =~ /^/\attribute3=\"(.*?)/;   # capture everything after the quotation mark
        $part_I_want .= $1;   # the capture group; save the stuff on line 1
        # keep adding to the string until we reach the closing quotation marks
        next (unless $line =~ /\"/){
             $part_I_want .= $_;    
        }
    }
}

The code above doesn't work. How do I grep capture a multiline pattern between two characters (in this case it's quotation marks)?

Upvotes: 1

Views: 442

Answers (3)

miken32
miken32

Reputation: 42743

From the command line:

perl -n0e '/\/attribute3="(.*)"/s && print $1' foo.txt 

This is basically what you had, but the 0 flag is the equivalent of undef $/ within the code. From the man page:

-0[octal/hexadecimal]

specifies the input record separator ($/) as an octal or hexadecimal number. If there are no digits, the null character is the separator.

Upvotes: 1

Matt Jacob
Matt Jacob

Reputation: 6553

my $str = do { local($/); <DATA> };
$str =~ /attribute3="([^"]*)"/;
$str = $1;
$str =~ s/\n/ /g;

__DATA__
Random words go here
/attribute1
/attribute2
/attribute3="all*the*things*I'm*interested*in*are*inside*here**
and*it*goes*into*the*next*line.*blah*blah*blah*foo*foo*foo*foo*
bar*bar*bar*bar*random*words*go*here*until*the*end*of*the*sente
nce.*I*think*we*have*enough*words"

Upvotes: 2

Hellmar Becker
Hellmar Becker

Reputation: 2982

Read the entire file into a single variable and use /attribute3=\"([^\"]*)\"/ms

Upvotes: 1

Related Questions