Reputation: 478
It can be in XML or any text format. How in general to grep for a block of text in Perl?
<track type="ws">
<range>
<rangeStart>0</rangeStart>
<rangeEnd>146.912</rangeEnd>
<locationIndex>0</locationIndex>
<propertyIndex>0</propertyIndex>
</range>
</track>
<track type="ps" id="1">
<range>
<rangeStart>0</rangeStart>
<rangeEnd>146.912</rangeEnd>
<locationIndex>1</locationIndex>
<propertyIndex>1</propertyIndex>
</range>
</track>
I want to grep for type="ps"
and get everything till the </range>
.
One solution is to open the file, read it line by line and then match the block.
open(FH, "file.txt");
foreach $line (<FH>) {
if ($line =~ m/type="cc"(.*?)<\/range>/) {
print $1;
}
}
but is there a more optimal solution without reading the file line by line?
Upvotes: 1
Views: 2089
Reputation: 16171
For XML look at xml_grep and xml_grep2. XML is quite different from plain text in that it is not line-oriented, so line oriented tools like grep, sed, awk or ack are not guaranteed to work properly.
Upvotes: 0
Reputation: 8774
Bjørn is absolutely right for XML. For your more general question, you might also be interested in one of my most favorite per one-liners:
perl -ne 'print if /type="cc"/../<\/range>/' input.txt
Upvotes: 5
Reputation: 6943
Reading line by line will only work if the XML is formatted with newlines like this, which it's likely not. You should be using a real XML parser.
If your data isn't too large (a few (tens of) MB) then you might be able to read it with XML::Simple and then traverse the generated data structure. You should also have a look at XML::XPathEngine.
Upvotes: 3