Reputation: 524
I've gotten into a bit of a jam and was wondering if someone could clear it up. What I want to do is:
Here I slurp a file into a variable, use regex to obtain my data and put it into an array:
while (my $row = <$fh>) {
unless ($. == 0) {
{
local $/; # enable slurp
@datalist = <$fh> =~ /\s*\d*\/\s*\d*\|\s*(.*?)\|.*?(?:.*?\|){4}\s*(\S*)\|(\S*).*\|/g; #extract article numbers # $1 = article number, $2 = quantity, $3 = unit
}
push(@arrayofarrays,[@datalist]);
push(@filenames,$file);
last;
}
}
$numr++;
}
open(my $feh,">","test.txt");
print {$feh} Dumper \@arrayofarrays;
A Dumper shows that my data looks fine (pseudoresults to make it easy to read and short):
$VAR1 = [
[
'data type1',
'data type2',
'data type3',
'data type1',
'data type2',
'data type3',
...
],
[
'data type1',
'data type2',
'data type3',
...
],
...
];
So I'm wondering if anyone knows an easy way to check for duplicates between sets of data? I know I can print individual data sets using
What I tried might give a better idea as to what I need to do:
my $i = 0;
my $j = 0;
while ( $i <= scalar @arrayofarrays) {
$j = 0;
while ( $j <= scalar @arrayofarrays) {
if (@{$arrayofarrays[$i]} eq @{$arrayofarrays[$j]}) {
print "\n'$filenames[$i]' is duplicate to '$filenames[$j]'.";
} $j++;
} $i++;
}
Upvotes: 0
Views: 52
Reputation: 641
Instead of array of arrays I'd create a hash of arrays, producing keys from subarrays' data by flattening subarrays to strings optionally turning them to checksums (this would be appropriate for multidimensional subarrays). You may want to read this discussion on PerlMonks:
http://www.perlmonks.org/?node_id=1121378
The abstract example given an already existing array with duplicate data in subarrays (you may test it here on ideone.com):
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my @array = (
[1,'John','ABXC12132328'],
[0,'John','ABXC12132322'],
[0,'John','ABXC12132322'],
[0,'John','ABXC12132322'],
[0,'John','ABXC12132322'],
[0,'John','ABXC12132322'],
[0,'John','ABXC12132322'],
[0,'John','ABXC12132322'],
[0,'John','ABXC12132322'],
[0,'John','ABXC12132322'],
[0,'John','ABXC12132322'],
[0,'John','ABXC12132322'],
[0,'John','ABXC12132322'],
[0,'John','ABXC12132322'],
[0,'John','ABXC12132322']
);
my %uniq_helper = ();
my @uniq_data = grep { !$uniq_helper{"@$_"}++ } @array;
print Dumper(\%uniq_helper) . "\n";
print Dumper(\@uniq_data) . "\n";
For your case it will probably look like this:
my %datalist;
while (my $row = <$fh>) {
unless ($. == 0) {
{
local $/; # enable slurp
@data = <$fh> =~ /\s*\d*\/\s*\d*\|\s*(.*?)\|.*?(?:.*?\|){4}\s*(\S*)\|(\S*).*\|/g; #extract article numbers # $1 = article number, $2 = quantity, $3 = unit
}
$datalist{"@data"} = \@data;
push(@filenames,$file);
last;
}
}
$numr++;
Upvotes: 1
Reputation: 7950
When you create the @dataList, create a key for it and check for that key before you do the push, something like:
my %checkHash=undef;
my $key=arrayKey(\@datalist);
if (!$checkHash{$key}) {
push(@arrayofarrays,[@datalist]);
push(@filenames,$file);
$checkHash{$key}=1;
last;
}
sub arrayKey($) {
my $arrayRef = shift;
my $output=undef;
for (@$arrayRef) {
if (ref($_) eq 'ARRAY') {
$output.="[";
$output.=arrayKey($_);
$output.="]";
}
else {
$output.="$_,";
}
}
return $output;
}
Upvotes: 0