Reputation: 19
How to remove duplicate lines?
My current code:
use strict;
use warnings;
my $input = input.txt;
my $output = output.txt;
my %seen;
open("OP",">$output") or die;
open("IP","<$input") or die;
while(my $string = <IP>) {
my @arr1 = join("",$string);
my @arr2 = grep { !$seen{$_}++ } @arr1;
print "@arr2\n";
print OP "@arr2\n";
}
close("IP");
close("OP");
Input:
india
australia
america
singapore
india
america
Expected output :
india
australia
america
singapore
Upvotes: 0
Views: 652
Reputation: 12347
Use this Perl one-liner to delete all duplicates, whether adjacent or not:
perl -ne 'print unless $seen{$_}++;' input.txt > output.txt
To delete only adjacent duplicates (as in UNIX uniq
command):
perl -ne 'print unless $_ eq $prev; $prev = $_; ' input.txt > output.txt
The Perl one-liners use these command line flags:
-e
: Tells Perl to look for code in-line, instead of in a file.
-n
: Loop over the input one line at a time, assigning it to $_
by default.
When the line is seen for the first time, $seen{$_}
is evaluated first, and is false, so the line is printed. Then, $seen{$_}
is incremented by one, which makes it true every time the line is seen again (thus the same line is not printed any more).
The first one-liner avoids reading the entire file into memory all at once, which could be important for inputs with lots of long duplicated lines. Only the first occurrence of every line is stored in memory, together with its number of occurrences.
SEE ALSO:
Upvotes: 4
Reputation: 69244
You are making this all far too complicated. The main section of your code can be simplified to:
while (<IP>) {
print unless $seen{$_}++;
}
Or even:
print grep { ! $seen{$_}++ } <IP>;
Upvotes: 1
Reputation: 3222
Removed unwanted line of codes from script.
Here is the updated script:
use strict; use warnings;
use Data::Dumper;
my %seen;
my @lines = <DATA>;
chomp @lines;
my @contries = grep { !$seen{$_}++ } @lines;
print Dumper(\@contries);
__DATA__
india
australia
america
singapore
india
america
Result:
$VAR1 = [
'india',
'australia',
'america',
'singapore'
];
Upvotes: 2
Reputation: 6798
Please investigate the following code snippet, you was very close to utilize %seen
hash.
use strict;
use warnings;
use feature 'say';
my %seen;
my @uniq;
while( <DATA> ) {
chomp;
push @uniq, $_ unless $seen{$_};
$seen{$_} = 1;
}
say for @uniq;
__DATA__
india
australia
america
singapore
india
america
Output
india
australia
america
singapore
Upvotes: 2