dreamer
dreamer

Reputation: 478

Perl - how to grep a block of text from a file

It can be in XML or any text format. How in general to grep for a block of text in Perl?

<track type="ws">
      <range>
       <rangeStart>0</rangeStart>
       <rangeEnd>146.912</rangeEnd>
       <locationIndex>0</locationIndex>
       <propertyIndex>0</propertyIndex>
      </range>
</track>
<track type="ps" id="1">
      <range>
       <rangeStart>0</rangeStart>
       <rangeEnd>146.912</rangeEnd>
       <locationIndex>1</locationIndex>
       <propertyIndex>1</propertyIndex>
      </range>
</track>

I want to grep for type="ps" and get everything till the </range>.

One solution is to open the file, read it line by line and then match the block.

open(FH, "file.txt");
foreach $line (<FH>) {
    if ($line =~ m/type="cc"(.*?)<\/range>/) {
        print $1;
    }
}

but is there a more optimal solution without reading the file line by line?

Upvotes: 1

Views: 2089

Answers (3)

mirod
mirod

Reputation: 16171

For XML look at xml_grep and xml_grep2. XML is quite different from plain text in that it is not line-oriented, so line oriented tools like grep, sed, awk or ack are not guaranteed to work properly.

Upvotes: 0

Christopher Creutzig
Christopher Creutzig

Reputation: 8774

Bjørn is absolutely right for XML. For your more general question, you might also be interested in one of my most favorite per one-liners:

perl -ne 'print if /type="cc"/../<\/range>/' input.txt

Upvotes: 5

Ask Bj&#248;rn Hansen
Ask Bj&#248;rn Hansen

Reputation: 6943

Reading line by line will only work if the XML is formatted with newlines like this, which it's likely not. You should be using a real XML parser.

If your data isn't too large (a few (tens of) MB) then you might be able to read it with XML::Simple and then traverse the generated data structure. You should also have a look at XML::XPathEngine.

Upvotes: 3

Related Questions