Reputation: 8144
Hi Im working on perl script to split Big xml to small chunks. And i have refereed this link Split file by XML tag
and my code is like this
if($line =~ /^</row>/)
{
$count++;
}
but im getting this error
works\filesplit.pl line 20.
Bareword found where operator expected at E:\Work\perl works\filesplit.pl line 2
0, near "/^</row"
(Missing operator before row?)
syntax error at E:\Work\perl works\filesplit.pl line 20, near "/^</row"
Search pattern not terminated at E:\Work\perl works\filesplit.pl line 20.
Can anyone help me
Update
<row>
<date></date>
<ForeignpostingId />
<country>11</country>
<domain>http://www.xxxx.com</domain>
<domainid>20813</domainid>
</row>
<row>
<date></date>
<ForeignpostingId />
<country>11</country>
<domain>http://www.xxxx.com</domain>
<domainid>20813</domainid>
</row>
<row>
<date></date>
<ForeignpostingId />
<country>11</country>
<domain>http://www.xxxx.com</domain>
<domainid>20813</domainid>
</row>
Upvotes: 2
Views: 3012
Reputation: 16161
Have you tried xml_split
? It's a tool that comes with XML::Twig that's specifically designed to split big XML files, based on a variety of criteria (tag name, level, size).
Upvotes: 3
Reputation: 6204
Perhaps the following will be helpful:
use strict;
use warnings;
my $i = 1;
local $/ = '<row>';
while (<>) {
chomp;
s!</row>!! or next;
open my $fh, '>', 'File_' . ( sprintf '%05d', $i++ ) . '.xml' or die $!;
print $fh $_;
}
Usage: perl script.pl inFile.xml
This sets Perl's record separator $/
to <row>
to read the xml file in those 'chunks' delimited by <row>
. It removes the </row>
from the chunk, then writes out that chunk to a file that has the naming scheme of "File_nnnnn.xml".
Upvotes: 2
Reputation: 1484
#!/bin/perl -w
## splitting xml files using perl script
print "Input File ? ";
chomp($XmlFile = <STDIN>);
open $XmlFileHandle,'<',$XmlFile;
print "\nSplit By which Tag ? ";
chomp($splitby = <STDIN>);
open $OutputHandle, '>','OutputFile_'.$splitby;
## to split by <user>...</user>
while(<$XmlFileHandle>){
if(/<$splitby>/){
print $OutputHandle "<$splitby>\n";
last;
}
}
while(<$XmlFileHandle>){
$line = $_;
if($line =~ m/<\/$splitby>/){
print $OutputHandle "</$splitby>";
last;
}
print $OutputHandle $line;
}
print "\nOutput File is : OutputFile_$splitby\n";
Upvotes: 0
Reputation: 23502
You need ^<\/row>
provided that you are trying to match </row>
at the beginning of the line. Here is my test code.
#!/usr/bin/perl
use strict;
use warnings;
my $line = "</row> something";
if ($line =~ /^<\/row>/)
{
print "found a match \n";
}
OUTPUT:
# perl test.pl
found a match
Update
posting this update after OP provided sample data.
You need ^\s+<\/row>
in your regex because not all of them are starting at the beginning of the line. some of them have one space
before them. hence we need to match zero or more spaces at the begining of the line before we do actual match.
code:
#!/usr/bin/perl -w
use strict;
use warnings;
while (my $line = <DATA>)
{
if ($line =~ /^\s+<\/row>/)
{
print "found a match \n";
}
}
__DATA__
<row>
<date></date>
<ForeignpostingId />
<country>11</country>
<domain>http://www.xxxx.com</domain>
<domainid>20813</domainid>
</row>
<row>
<date></date>
<ForeignpostingId />
<country>11</country>
<domain>http://www.xxxx.com</domain>
<domainid>20813</domainid>
</row>
<row>
<date></date>
<ForeignpostingId />
<country>11</country>
<domain>http://www.xxxx.com</domain>
<domainid>20813</domainid>
</row>
Output:
# perl test.pl
found a match
found a match
found a match
Upvotes: 2