Ravi M
Ravi M

Reputation: 81

Perl hash of array how to check if the object exists before adding

I have this code to take rows and place them into %data. One row in DATA (last row) is a duplicate so I don’t want it to be added to %data. How do I check of the app_id and ci_name combination doesn’t already exist before pushing the row into %data? Something like

push .. unless {app_id already exists}

The code to modify:

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

my %data;

while( <DATA> ) {
    chomp;
    next if /app_id/;
    my ($app_id,$ci_name,$app_name) = split /,/;
    push @{$data{$ci_name}}, {app_id => $app_id, app_name => $app_name };
}

print Dumper(\%data);

__DATA__
app_id,ci_name,app_name
1234,hosta7,Managed File Transfer
1235,hosta7,Patrtol
1236,hosta7,RELATIONAL DATA WAREHOUSE
1237,hosta7,Managed File Transfer
1238,hosta7,Initio Application
1239,hosta7,Data Warehouse Operations Infrastructure
2345,hostb,Tableou
2345,hostb,Tableou

Upvotes: 1

Views: 159

Answers (3)

Helmut Wollmersdorfer
Helmut Wollmersdorfer

Reputation: 451

If you want to keep all records (adapted from @ikegami):

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

<DATA>;  # Skip header.

my %data;

while (<DATA>) {
    chomp;
    my ($app_id, $ci_name, $app_name) = split /,/;
    push @{ $data{$ci_name}{$app_id} }, { app_id => $app_id, app_name => $app_name };
}

print Dumper(\%data);

But then it would be better to code:

$data{$ci_name}{$app_id}{$app_name}++;

Upvotes: 0

ikegami
ikegami

Reputation: 386696

You could temporarily use a HoH instead of a HoA.

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

<DATA>;  # Skip header.

my %data;
my %seen;
while (<DATA>) {
    chomp;
    my ($app_id, $ci_name, $app_name) = split /,/;
    $data{$ci_name}{$app_id} //= { app_id => $app_id, app_name => $app_name };
}

# Convert HoH to HoA.
$data{$_} = [ values(%{ $data{$_} }) ]
   for keys(%data);

print Dumper(\%data);

The above keeps the first of the duplicates, and it doesn't preserve order. Change //= to = to keep the last of the duplicates. Read on for a solution that preserves order.


The following is a common way of removing duplicates while preserving order:

my %seen;
my @uniq = grep !$seen{$_}++, @values;

We can adapt that idiom to our needs.

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

<DATA>;  # Skip header.

my %data;
my %seen;
while (<DATA>) {
    chomp;
    my ($app_id, $ci_name, $app_name) = split /,/;
    push @{ $data{$ci_name} }, { app_id => $app_id, app_name => $app_name }
       if !$seen{$ci_name}{$app_id}++;
}

print Dumper(\%data);

The above keeps the first of the duplicates, and it preserves order.


Both of these solution have a speed of O(N), whereas the previously posted solution has a speed of O(N2), so this solution scales much better. To be honest though, the previously posted solution has a practical speed of O(N) unless there's a lot of duplicates.


Note how I added <DATA> before the loop? It's far better than skipping all lines that contain app_id anywhere in the line!

Upvotes: 3

sticky bit
sticky bit

Reputation: 37507

You can use grep() with a block, that checks if the app_id equals the one to be inserted.

...
push @{$data{$ci_name}}, {app_id => $app_id, app_name => $app_name } unless grep { $_->{'app_id'} == $app_id; } @{$data{$ci_name}};
...

Upvotes: 2

Related Questions