Husk01inJun
Husk01inJun

Reputation: 33

How to search in between `xml` tags in `perl`?

I have two xml file looks something like this

file1.xml

< uf>232< /uf>
< boid>32892< /boid>
< end> End of xml 1 < /end>



file2.xml

< id> 232 < /id>
< boid>< /boid>
< end> End of xml 2 < /end>



I have to write a function in perl which will copy the numbers in between < boid> tag of file1.xml and write in between < boid> tag of file2.xml.
Problem is I am not allowed to include any parsing module as its an enhancement, I have tried something like this :

open(my $vt_open1 ,'<' "file1.xml");<br>
open(my $vt_open2 ,'+>' "file2.xml");<br>
select $vt_open2  or die $!;
while($vt_open1){
    if ($. == 2) {
        print $vt_open1;
    }

}

This is not working and is writing the entire file.
I am having trouble in finding the logic, and using line number is not a good logic,
I am new to perl, Appreciate the help.

Upvotes: 0

Views: 702

Answers (1)

Sobrique
Sobrique

Reputation: 53498

Don't. Use a library. Seriously. It's an utterly terrible idea to hack together your own parser just because you don't want to install one. XML is contextual. Regex is not. It will NEVER be better than a dirty hack to parse XML with regex, and you don't need to, because xpath exists.

Most standard distributions include XML::Twig as a package, so you don't even have to CPAN it. Or you can install it 'locally':

"How do I keep my own module library/directory"

You will always be creating brittle code by doing this.

However, just because I've been there and got stuck doing it:

#!/usr/bin/env perl
use strict;
use warnings;

my $xml1 = '
<xml>
<uf>232</uf>
<boid>32892</boid>
<end> End of xml 1 </end>
</xml>';

my ( $boid_value ) = $xml1=~ m,<boid>([^<]+)</boid>,ms;
print $boid_value;

my $xml2 = '
<xml>
<uf>232</uf>
<boid></boid>
<end> End of xml 2 </end>
</xml>';

$xml2 =~ s,<boid>[^<]*</boid>,<boid>$boid_value</boid>,ms;

print "Modified XML is:\n";
print $xml2;

I will caveat this with - this will always be a risky choice, and may one day break entirely, because you can reformat XML in a bunch of different ways that are semantically identical. Or someone might add an attribute to <boid> one day, or something similar, and your thing will just break.

For the sake of comparison - with XML::Twig this looks like:

#!/usr/bin/env perl
use strict;
use warnings;

use XML::Twig; 

my $xml1 = '
<xml>
<uf>232</uf>
<boid>32892</boid>
<end> End of xml 1 </end>
</xml>';

my $xml2 = '
<xml>
<uf>232</uf>
<boid></boid>
<end> End of xml 2 </end>
</xml>';

my $twig = XML::Twig -> new -> parse ( $xml1 );
my $second_xml =  XML::Twig -> new -> parse ( $xml2 );

my $boid_value = $twig -> get_xpath('//boid',0)->text;

$_ -> set_text($boid_value) for $second_xml->get_xpath('//boid');

$second_xml -> set_pretty_print('indented');
$second_xml -> print;

Upvotes: 3

Related Questions