Sparrow
Sparrow

Reputation: 148

Want to split an UNIX xml file based on tags

I've a XML file with batches like below.

I want to split this file into 5 files based on the tags using shell scripting. Please help, thanks in advance.

<Items>
<Item>
<Title>Title 1</Title>
<DueDate>01-02-2008</DueDate>
</Item>
<Item>
<Title>Title 2</Title>
<DueDate>01-02-2009</DueDate>
</Item>
<Item>
<Title>Title 3</Title>
<DueDate>01-02-2010</DueDate>
</Item>
<Item>
<Title>Title 4</Title>
<DueDate>01-02-2011</DueDate>
</Item>
<Item>
<Title>Title 5</Title>
<DueDate>01-02-2012</DueDate>
</Item>
</Items>

Desired output:

<Items>
<Item>
<Title>Title 1</Title>
<DueDate>01-02-2008</DueDate>
</Item>
</Items>

Upvotes: 0

Views: 1583

Answers (1)

Sobrique
Sobrique

Reputation: 53498

I would suggest - install XML::Twig which includes the rather handy xml_split utility. That may do what you need. E.g.:

xml_split -c Item

However I'd offer what you're trying to accomplish isn't amazingly easy, because you're trying to cut up and retain the XML structure. You can't do it with standard line/regex based tools.

However you can use a parser:

#!/usr/bin/env perl

use strict;
use warnings;
use XML::Twig;

my @item_list;

sub cut_item {
    my ( $twig, $item ) = @_;
    my $thing = $item->cut;
    push( @item_list, $thing );

}

my $twig = XML::Twig->new(
    twig_handlers => { 'Item' => \&cut_item }
);
$twig->parse(<>);

my $itemcount = 1;

foreach my $element (@item_list) {
    my $newdoc = XML::Twig->new( 'pretty_print' => 'indented_a' );
    $newdoc->set_root( XML::Twig::Elt->new('Items') );

    $element->paste( $newdoc->root );
    $newdoc->print;
    open( my $output, ">", "items_" . $itemcount++ . ".xml" );
    print {$output} $newdoc->sprint;
    close($output);
}

This uses the XML::Twig library to extract each of the Item elements from your XML (piped on STDIN, or via myscript.pl yourfilename).

It then iterates all the ones it found, adds an Items header, and prints it to a separate file. This approach might take a little more fiddling if you had a more complex root, but it is adaptable if you do.

Upvotes: 1

Related Questions