Reputation: 2861
I have a input file that existing so many redundant records , I try to write a program to remove the part of redundance , But it seems still remain some redundance , but I can't find out what's wrong with it
ARGV[0] is the input file with redundance
ARGV[1] is the output file without redundance of the input file
open(Input,"<./$ARGV[0]");
open(Output,">./$ARGV[1]");
while( eof(Input) !=1)
{
push(@Records,readline(*Input));
}
close Input;
# Solution 2
for($i=0;$i<$#Records;$i++)
{
for($j=$i+1;$j<$#Records;$j++)
{
if($Records[$i] eq $Records[$j])
{
$Records[$j] = undef;
}
}
}
@Records = grep defined,@Records;
=begin
# Solution 1 have some problems
for($i=0;$i<$#Records;$i++)
{
for($j=$i+1;$j<$#Records;$j++)
{
if($Records[$i] eq $Records[$j])
{
splice @Records,$j,1;
$j = $j-1;
}
}
}
=end
=cut
foreach $Each(@Records)
{
print Output $Each;
}
close Output;
thanks
Upvotes: 0
Views: 95
Reputation: 126732
Your “solution 1” is the closest. Setting an array element to undef
doesn't remove it, and will cause a warning message if you have warnings enabled as you should.
This solution checks each record at index $j
and either removes it using splice
if it is a duplicate (which will shuffle the remaining records down so that the next record to be compared will be at the same index) or leaves it in place and skips over it by incrementing $j
.
It is best practice to use lexical file handles (like $infh
) rather than bare word file handles (like Input
). You should also use the three-parameter form of open
, and always check whether it has succeeded. Here I have used autodie
to avoid checking every open
explicitly. It will throw an exception if any open
call fails.
use strict;
use warnings;
use autodie;
my ($infile, $outfile) = @ARGV;
my @records = do {
open my $infh, '<', $infile;
<$infh>;
};
for my $i (0..$#records-1) {
my $j = $i + 1;
while ($j < @records) {
if ($records[$j] eq $records[$i]) {
splice @records, $j, 1;
}
else {
++$j;
}
}
}
open my $outfh, '>', $outfile;
print $outfh $_ for @records;
close $outfh;
An alternative solution using a hash looks like this
use strict;
use warnings;
use autodie;
my ($infile, $outfile) = @ARGV;
open my $infh, '<', $infile;
open my $outfh, '>', $outfile;
my %seen;
while (<$infh>) {
print $outfh $_ unless $seen{$_}++;
}
Upvotes: 1
Reputation: 13792
This is a more perl-modern solution:
open(my $fh_input, '<', $ARGV[0]) or die $!;
open(my $fh_output, '>', $ARGV[1]) or die $!;
my %records = ();
while( my $line = <$fh_input> )
{
$records{$line} = 1;
}
foreach my $record(keys %records)
{
print $fh_output $record;
}
close $fh_input;
close $fh_output;
As you can see, I used a hash to avoid duplications
Upvotes: 2
Reputation: 1342
You can simply use uniq()
.
my @records;
while( eof(Input) !=1)
{
push(@records,readline(*Input));
}
close Input;
@records = uniq(@records); ## Unique elements in @records
Please have a look at its documentation here.
Upvotes: 1