Reputation: 15876
I have an XML file that has around 150k records. The format of the record is shown below:
<product>
<product_id>1</product_id>
<product_name>ABC1</product_name>
</product>
<product>
<product_id>2</product_id>
<product_name>ABC2</product_name>
</product>
<product>
<product_id>3</product_id>
<product_name>ABC3</product_name>
</product>
<product>
<product_id>3</product_id>
<product_name>ABC4</product_name>
</product>
<product>
<product_id>4</product_id>
<product_name>ABC5</product_name>
</product>
<product>
<product_id>5</product_id>
<product_name>ABC6</product_name>
</product>
<product>
<product_id>6</product_id>
<product_name>ABC7</product_name>
</product>
When i load the above file i get unique constraint violation errors - Meaning that some of the records are using the same product_id which database would not allow.
Is there an easier way in VI to parse the file to know/display all the products that are using a non-unique ID (using the product_id tag). As an example, the above sample has two products using the same unique ID of 3.
Upvotes: 0
Views: 242
Reputation: 698
I believe that rhe right way to do this is by writing a Perl script to process rhe xml tree and throw meaningful errors. Most likely such a script would make use of an existing Perl packages to handle XMLfiles such as XML::Parser.
Best Regards, Nadav.
Upvotes: 1
Reputation: 36272
Based in Nadav's suggestion but with a different parser, here it's an approach using perl and its XML::Twig
module. It prints all repeated ids separated with commas:
#!/usr/bin/env perl
use warnings;
use strict;
use XML::Twig;
my (@rep_ids, %id);
XML::Twig->new(
twig_roots => {
'product/product_id' => sub {
my $id = $_->text_only;
if ( exists $id{ $id } ) {
push @rep_ids, $id;
}
$id{ $id } = 1;
},
},
)->parsefile( shift );
printf qq|%s\n|, join q|,|, @rep_ids;
Run it like:
perl script.pl xmlfile
That yields:
3
Upvotes: 1