gauss76
gauss76

Reputation: 137

XML::Twig Use twig handlers or twig roots to update part of xml file

I have been using the module XML::Twig in Perl for a few weeks now. So far I have been loading an entire xml file into memory and then editing values in the file. Finally I save the file under a new name for further use.

Up until now I have been dealing with fairly small xml files but now have to do some modifications on some very large xml files (10 000+ lines).

There are 100's of tags in these large files but I only want to modify, let's say, 10 of them.

Is there a way to load in just the tags I need to modify. Change the tag values and then save the resulting changes to a new xml file that has all the information the original one had but has the 10 tags modified?

see in the XML::Twig documentation that there are twig handlers to load in just part of an xml document, however in the examples I have tried when trying to modify just a few tags only those modified parts are returned as an xml file and the rest of the information is lost! Which is no good to me.

Below is an example structure I am dealing with

<datatag1 a="1A">
    <t>A</t>
</datatag1>
<datatag1 a="B2">
    <t>D</t>
</datatag1>
<datatag1 a="3C">
    <t>1</t>
</datatag1>
<datatag1 a="4S3">
    <t>14</t>
</datatag1>
<datatag1 a="5AA3">
    <t>1</t>
</datatag1>

What I would like to do is change datatag1's child t value, let's say, from A to B where a="1A". So then my modified xml would be:

<datatag1 a="1A">
    <t>B</t>
</datatag1>
<datatag1 a="B2">
    <t>D</t>
</datatag1>
<datatag1 a="3C">
    <t>1</t>
</datatag1>
<datatag1 a="4S3">
    <t>14</t>
</datatag1>
<datatag1 a="5AA3">
    <t>1</t>
</datatag1>

Furthermore, I have a hash containing a set of keys that list the "a" values I want to modify. The hashes values giving the new "t" values I want to insert.

Please let me know if you require any further information or anything is unclear.

Upvotes: 2

Views: 694

Answers (1)

Sobrique
Sobrique

Reputation: 53508

Yes, you absolutely can do this with XML::Twig.

The core point is that a twig_handler fires as the parse is happening. However, to output the 'story so far' you need to flush - or maybe purge.

#!/usr/bin/env perl

use warnings;
use strict;
use XML::Twig;

sub modify_datatag {
    my ( $twig, $datatag ) = @_;

    if ( $datatag -> att('a') eq '1A' ) {
        $datatag -> first_child('t') -> set_text('new text here'); 

    }

    #delete this, as it breaks the XML. But you get the point. 
    print "\n## flushing twig from memory\n";
    $twig -> flush;

}

my $xml = XML::Twig -> new ( 'twig_handlers' => { 'datatag1' => \&modify_datatag } ); 
$xml -> parse ( \*DATA );
$xml -> flush;


__DATA__
<xml>
<datatag1 a="1A">
    <t>B</t>
</datatag1>
<datatag1 a="B2">
    <t>D</t>
</datatag1>
<datatag1 a="3C">
    <t>1</t>
</datatag1>
<datatag1 a="4S3">
    <t>14</t>
</datatag1>
<datatag1 a="5AA3">
    <t>1</t>
</datatag1>
</xml>

Each time the flush is called, the progress so far is output to the file. Any tags that are still being processed (e.g. aren't closed) will be retained in memory.

You could purge instead, but that will discard.

The above prints to STDOUT - but you can use parsefile_inplace to rewrite the existing file.

You can also specify a filehandle argument to flush to ... do what it says on the tin.

Upvotes: 3

Related Questions