biji
biji

Reputation: 59

How to eliminate tag names in xml file using perl

I have multiple XML files in a folder,so I written script like this to combine into one xml file

#!/usr/bin/perl
use warnings;
use XML::LibXML;
use Carp;
use File::Find;
use File::Spec::Functions qw( canonpath );
use XML::LibXML::Reader;
use Digest::MD5 'md5';

if ( @ARGV == 0 ) {
push @ARGV, "c:/main/work";
warn "Using default path $ARGV[0]\n  Usage: $0  path ...\n";
}

open( my $allxml, '>', "all_xml_contents.combined.xml" )
 or die "can't open output xml file for writing: $!\n";
print $allxml '<?xml version="1.0" encoding="UTF-8"?>',
"\n<Shiporder xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\">\n";
 my %shipto_md5;
find(
sub {
return unless ( /(_stc\.xml)$/ and -f );
extract_information();
return;
 },
@ARGV
);

print $allxml "</Shiporder>\n";

sub extract_information {
my $path = $_;
 if ( my $reader = XML::LibXML::Reader->new( location => $path )) {
while ( $reader->nextElement( 'data' )) {
    my $elem = $reader->readOuterXml();
    my $md5 = md5( $elem );
    print $allxml $reader->readOuterXml() unless ( $shipto_md5{$md5}++ );
 }
 }
return;
}

It printing all xml files into one xml like this.

 all_xml.combined.xml
 <?xml version="1.0" encoding="UTF-8"?>
<student specification xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <student>
<name>johan</name>
 </student>

<student>
<name>benny</name>
</student>

 <student>
<name>kent</name>
 </student>

 </student specification>

but I have one more node information in one xml file, i tried to extract that information like this in while loop.

    $reader->nextElement( 'details' );
     $information = $reader->readInnerXml();

but how can i add this information to output file, please help me with this problem.

Upvotes: 1

Views: 272

Answers (3)

rpg
rpg

Reputation: 1652

Will it be possible for you to switch to XML::Twig? It provides excellent way of handling the tags.

Probably you need something like

 my $twig=XML::Twig->new(   
    twig_handlers => 
      { 
        **student with specification** => sub { $_->delete;       },  # remove hidden elements
      },

You need to modify the student with specification to work for you. Sorry, I don't have much time, otherwise I would have written complete code.

Upvotes: 2

William Walseth
William Walseth

Reputation: 2923

Here's some code that does it using DOMDocument()

Over all, 1) Create a parent document from a string or similar 2) Load each file, import, and append 3) Save the results.

It's usually better in XML programming to use XML parser functions, rather than string manipulation.

Good luck.

function loadXMLString( $strXML ) {
    $xmlDoc = new DOMDocument();
    $xmlDoc->formatOutput = true; 
    $xmlDoc->loadXML( $strXML );
    return $xmlDoc;
}

function loadXMLFile( $strFileName, $defaultXML=null ) {
    $xmlDoc = new DOMDocument();
    if( file_exists( $strFileName )  ){
        $xmlDoc->load( $strFileName );
    } else {
        if( $defaultXML == null  ) {
            throw new Exception( "Cannot locate file: " . $strFileName . " no default specified." );
        } else {
            // create it, if default XML is supplied
            return $this->loadXMLString( $defaultXML );
        } 
    }
    return $xmlDoc;
}


$xmlMain = loadXMLString( "<xmlparent/>" );

$xmlChild = loadXMLFile( "test1.xml" );
$ndTemp = $xmlMain->importNode( $xmlChild->documentElement, true );
$xmlMain->documentElement->appendChild( $ndTemp );

$xmlChild = loadXMLFile( "test2.xml" );
$ndTemp = $xmlMain->importNode( $xmlChild->documentElement, true );
$xmlMain->documentElement->appendChild( $ndTemp );

$xmlMain->save( "all.xml" );

Upvotes: 0

Dave Cross
Dave Cross

Reputation: 69314

Three obvious points.

  1. You're loading the XML::LibXML module but not making any use of it.
  2. The problematic XML declaration is always the first line of the input files. So why not just skip the first line?
  3. The file you will end up with will not be valid XML. An XML document needs a single root element. So you'll need to create another element (perhaps <students>) that surrounds all of the data from the other files.

Upvotes: 3

Related Questions