user2131116
user2131116

Reputation: 2861

Can't get rid of same records

I have a input file that existing so many redundant records , I try to write a program to remove the part of redundance , But it seems still remain some redundance , but I can't find out what's wrong with it

ARGV[0] is the input file with redundance

ARGV[1] is the output file without redundance of the input file

open(Input,"<./$ARGV[0]");
open(Output,">./$ARGV[1]");

while( eof(Input) !=1)
{
    push(@Records,readline(*Input));
}
close Input;

# Solution 2
for($i=0;$i<$#Records;$i++)
{
    for($j=$i+1;$j<$#Records;$j++)
    {
        if($Records[$i] eq $Records[$j])
        {
            $Records[$j] = undef;
        }
    }
}

@Records = grep defined,@Records;

=begin
# Solution 1 have some problems
for($i=0;$i<$#Records;$i++)
{
    for($j=$i+1;$j<$#Records;$j++)
    {
        if($Records[$i] eq $Records[$j])
        {
            splice @Records,$j,1;
            $j = $j-1;  
        }
    }
}
=end
=cut

foreach $Each(@Records)
{
    print Output $Each;
}
close Output;

thanks

Upvotes: 0

Views: 95

Answers (3)

Borodin
Borodin

Reputation: 126732

Your “solution 1” is the closest. Setting an array element to undef doesn't remove it, and will cause a warning message if you have warnings enabled as you should.

This solution checks each record at index $j and either removes it using splice if it is a duplicate (which will shuffle the remaining records down so that the next record to be compared will be at the same index) or leaves it in place and skips over it by incrementing $j.

It is best practice to use lexical file handles (like $infh) rather than bare word file handles (like Input). You should also use the three-parameter form of open, and always check whether it has succeeded. Here I have used autodie to avoid checking every open explicitly. It will throw an exception if any open call fails.

use strict;
use warnings;
use autodie;

my ($infile, $outfile) = @ARGV;

my @records = do {
    open my $infh, '<', $infile;
    <$infh>;
};

for my $i (0..$#records-1) {
    my $j = $i + 1;
    while ($j < @records) {
        if ($records[$j] eq $records[$i]) {
            splice @records, $j, 1;
        }
        else {
            ++$j;
        }
    }
}

open my $outfh, '>', $outfile;
print $outfh $_ for @records;
close $outfh;

An alternative solution using a hash looks like this

use strict;
use warnings;
use autodie;

my ($infile, $outfile) = @ARGV;

open my $infh,  '<', $infile;
open my $outfh, '>', $outfile;

my %seen;

while (<$infh>) {
  print $outfh $_ unless $seen{$_}++;
}

Upvotes: 1

Miguel Prz
Miguel Prz

Reputation: 13792

This is a more perl-modern solution:

open(my $fh_input, '<', $ARGV[0]) or die $!;
open(my $fh_output, '>', $ARGV[1]) or die $!;
my %records = ();

while( my $line = <$fh_input> )
{
   $records{$line} = 1;
}

foreach my $record(keys %records)
{
    print $fh_output $record;
}

close $fh_input;
close $fh_output;

As you can see, I used a hash to avoid duplications

Upvotes: 2

Krishnachandra Sharma
Krishnachandra Sharma

Reputation: 1342

You can simply use uniq().

my @records;
while( eof(Input) !=1)
{
    push(@records,readline(*Input));
}
close Input;

@records = uniq(@records); ## Unique elements in @records

Please have a look at its documentation here.

Upvotes: 1

Related Questions