Reputation: 81
I have this code to take rows and place them into %data
. One row in DATA (last row) is a duplicate so I don’t want it to be added to %data
. How do I check of the app_id
and ci_name
combination doesn’t already exist before pushing the row into %data
? Something like
push .. unless {app_id already exists}
The code to modify:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %data;
while( <DATA> ) {
chomp;
next if /app_id/;
my ($app_id,$ci_name,$app_name) = split /,/;
push @{$data{$ci_name}}, {app_id => $app_id, app_name => $app_name };
}
print Dumper(\%data);
__DATA__
app_id,ci_name,app_name
1234,hosta7,Managed File Transfer
1235,hosta7,Patrtol
1236,hosta7,RELATIONAL DATA WAREHOUSE
1237,hosta7,Managed File Transfer
1238,hosta7,Initio Application
1239,hosta7,Data Warehouse Operations Infrastructure
2345,hostb,Tableou
2345,hostb,Tableou
Upvotes: 1
Views: 159
Reputation: 451
If you want to keep all records (adapted from @ikegami):
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
<DATA>; # Skip header.
my %data;
while (<DATA>) {
chomp;
my ($app_id, $ci_name, $app_name) = split /,/;
push @{ $data{$ci_name}{$app_id} }, { app_id => $app_id, app_name => $app_name };
}
print Dumper(\%data);
But then it would be better to code:
$data{$ci_name}{$app_id}{$app_name}++;
Upvotes: 0
Reputation: 386696
You could temporarily use a HoH instead of a HoA.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
<DATA>; # Skip header.
my %data;
my %seen;
while (<DATA>) {
chomp;
my ($app_id, $ci_name, $app_name) = split /,/;
$data{$ci_name}{$app_id} //= { app_id => $app_id, app_name => $app_name };
}
# Convert HoH to HoA.
$data{$_} = [ values(%{ $data{$_} }) ]
for keys(%data);
print Dumper(\%data);
The above keeps the first of the duplicates, and it doesn't preserve order. Change //=
to =
to keep the last of the duplicates. Read on for a solution that preserves order.
The following is a common way of removing duplicates while preserving order:
my %seen;
my @uniq = grep !$seen{$_}++, @values;
We can adapt that idiom to our needs.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
<DATA>; # Skip header.
my %data;
my %seen;
while (<DATA>) {
chomp;
my ($app_id, $ci_name, $app_name) = split /,/;
push @{ $data{$ci_name} }, { app_id => $app_id, app_name => $app_name }
if !$seen{$ci_name}{$app_id}++;
}
print Dumper(\%data);
The above keeps the first of the duplicates, and it preserves order.
Both of these solution have a speed of O(N), whereas the previously posted solution has a speed of O(N2), so this solution scales much better. To be honest though, the previously posted solution has a practical speed of O(N) unless there's a lot of duplicates.
Note how I added <DATA>
before the loop? It's far better than skipping all lines that contain app_id
anywhere in the line!
Upvotes: 3
Reputation: 37507
You can use grep() with a block, that checks if the app_id
equals the one to be inserted.
...
push @{$data{$ci_name}}, {app_id => $app_id, app_name => $app_name } unless grep { $_->{'app_id'} == $app_id; } @{$data{$ci_name}};
...
Upvotes: 2