Reputation: 148
I've a XML file with batches like below.
I want to split this file into 5 files based on the tags using shell scripting. Please help, thanks in advance.
<Items>
<Item>
<Title>Title 1</Title>
<DueDate>01-02-2008</DueDate>
</Item>
<Item>
<Title>Title 2</Title>
<DueDate>01-02-2009</DueDate>
</Item>
<Item>
<Title>Title 3</Title>
<DueDate>01-02-2010</DueDate>
</Item>
<Item>
<Title>Title 4</Title>
<DueDate>01-02-2011</DueDate>
</Item>
<Item>
<Title>Title 5</Title>
<DueDate>01-02-2012</DueDate>
</Item>
</Items>
Desired output:
<Items>
<Item>
<Title>Title 1</Title>
<DueDate>01-02-2008</DueDate>
</Item>
</Items>
Upvotes: 0
Views: 1583
Reputation: 53498
I would suggest - install XML::Twig
which includes the rather handy xml_split
utility. That may do what you need. E.g.:
xml_split -c Item
However I'd offer what you're trying to accomplish isn't amazingly easy, because you're trying to cut up and retain the XML structure. You can't do it with standard line/regex based tools.
However you can use a parser:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
my @item_list;
sub cut_item {
my ( $twig, $item ) = @_;
my $thing = $item->cut;
push( @item_list, $thing );
}
my $twig = XML::Twig->new(
twig_handlers => { 'Item' => \&cut_item }
);
$twig->parse(<>);
my $itemcount = 1;
foreach my $element (@item_list) {
my $newdoc = XML::Twig->new( 'pretty_print' => 'indented_a' );
$newdoc->set_root( XML::Twig::Elt->new('Items') );
$element->paste( $newdoc->root );
$newdoc->print;
open( my $output, ">", "items_" . $itemcount++ . ".xml" );
print {$output} $newdoc->sprint;
close($output);
}
This uses the XML::Twig
library to extract each of the Item
elements from your XML (piped on STDIN, or via myscript.pl yourfilename
).
It then iterates all the ones it found, adds an Items
header, and prints it to a separate file. This approach might take a little more fiddling if you had a more complex root, but it is adaptable if you do.
Upvotes: 1