ziggy
ziggy

Reputation: 15876

Searching an XML file for duplicate lines or duplicate tags in VI

I have an XML file that has around 150k records. The format of the record is shown below:

<product>
<product_id>1</product_id>
<product_name>ABC1</product_name>
</product>
<product>
<product_id>2</product_id>
<product_name>ABC2</product_name>
</product>
<product>
<product_id>3</product_id>
<product_name>ABC3</product_name>
</product>
<product>
<product_id>3</product_id>
<product_name>ABC4</product_name>
</product>
<product>
<product_id>4</product_id>
<product_name>ABC5</product_name>
</product>
<product>
<product_id>5</product_id>
<product_name>ABC6</product_name>
</product>
<product>
<product_id>6</product_id>
<product_name>ABC7</product_name>
</product>

When i load the above file i get unique constraint violation errors - Meaning that some of the records are using the same product_id which database would not allow.

Is there an easier way in VI to parse the file to know/display all the products that are using a non-unique ID (using the product_id tag). As an example, the above sample has two products using the same unique ID of 3.

Upvotes: 0

Views: 242

Answers (2)

Nadav
Nadav

Reputation: 698

I believe that rhe right way to do this is by writing a Perl script to process rhe xml tree and throw meaningful errors. Most likely such a script would make use of an existing Perl packages to handle XMLfiles such as XML::Parser.

Best Regards, Nadav.

Upvotes: 1

Birei
Birei

Reputation: 36272

Based in Nadav's suggestion but with a different parser, here it's an approach using and its XML::Twig module. It prints all repeated ids separated with commas:

#!/usr/bin/env perl

use warnings;
use strict;
use XML::Twig;

my (@rep_ids, %id);

XML::Twig->new(
    twig_roots => {
        'product/product_id' => sub {
            my $id = $_->text_only;
            if ( exists $id{ $id } ) { 
                push @rep_ids, $id;
            }   
            $id{ $id } = 1;
        },  
    },  
)->parsefile( shift );

printf qq|%s\n|, join q|,|, @rep_ids;

Run it like:

perl script.pl xmlfile

That yields:

3

Upvotes: 1

Related Questions