Reputation: 111
I know that hashes can be used to remove duplicate lines in a file and it removes all the duplicate lines in a file. I used the following lines to remove all duplicate lines in a file..
my %lines;
while (<DATA>) {
print if not $lines{$_}++;
}
But, i need to remove only duplicate lines with matched patterns... Sample input file:
line1
line2
line3
line1 #duplicate line
line2 #duplicate line
line4
line5
Though both line1 and line2 are duplicated, i only want to remove duplication of line1.
output:
line1
line2
line3
line2 #this duplicated line need to be resumed
line4
line5
Any suggestion to combine hashes and regex to achieve my requirement???
Upvotes: 1
Views: 996
Reputation: 126722
This solution allows you to set up a regex pattern $check_dups
that defines which lines are susceptible to duplicate removal. If a line matches that pattern then it is removed if it has been seen before; all other lines are retained
Here, only duplicates of lines that match /line1/
are removed as required by the example in your question
use strict;
use warnings 'all';
my $check_dups = qr/line1/;
my %seen;
while ( <DATA> ) {
if ( /$check_dups/ ) {
print unless $seen{$_}++;
}
else {
print;
}
}
__DATA__
line1
line2
line3
line1
line2
line4
line5
line1
line2
line3
line2
line4
line5
Upvotes: 0
Reputation: 2808
Assuming that the previous line being removed is the trigger that makes a duplicate exempt from being deleted and that you want comments ignored;
use v5.12;
use warnings;
my %lines;
my $previous_line_removed = 0;
while (<>) {
my $original_line = $_ ;
chomp ;
s/\s*#.*?$// ;
if ( $lines{$_}++ && ! $previous_line_removed ) {
$previous_line_removed = 1 ;
}
else {
print $original_line ;
$previous_line_removed = 0 ;
}
}
#
# when fed data above...
#
line1
line2
line3
line2 #duplicate line
line4
line5
Upvotes: 0
Reputation: 2550
my %lines;
while (<DATA>) {
next if $lines{$_}++ and /^line2/;
print;
}
The part /^line2/
is a regular expression which describes some data/text. See http://perldoc.perl.org/perlre.html for details.
The next
line skips all lines which are duplicate and match whatever you like. You may easily negate this like ! /^line1/
;
Upvotes: 0