Sumathi Gokul
Sumathi Gokul

Reputation: 111

Perl script to remove only matched duplicate lines?

I know that hashes can be used to remove duplicate lines in a file and it removes all the duplicate lines in a file. I used the following lines to remove all duplicate lines in a file..

my %lines;
while (<DATA>) {
print if not $lines{$_}++;
}

But, i need to remove only duplicate lines with matched patterns... Sample input file:

line1
line2
line3
line1 #duplicate line
line2 #duplicate line
line4
line5

Though both line1 and line2 are duplicated, i only want to remove duplication of line1.

output:

line1
line2
line3
line2 #this duplicated line need to be resumed
line4
line5

Any suggestion to combine hashes and regex to achieve my requirement???

Upvotes: 1

Views: 996

Answers (3)

Borodin
Borodin

Reputation: 126722

This solution allows you to set up a regex pattern $check_dups that defines which lines are susceptible to duplicate removal. If a line matches that pattern then it is removed if it has been seen before; all other lines are retained

Here, only duplicates of lines that match /line1/ are removed as required by the example in your question

use strict;
use warnings 'all';

my $check_dups = qr/line1/;

my %seen;

while ( <DATA> ) {
    if ( /$check_dups/ ) {
        print unless $seen{$_}++;
    }
    else {
        print;
    }
}

__DATA__
line1
line2
line3
line1
line2
line4
line5

output

line1
line2
line3
line2
line4
line5

Upvotes: 0

Marty
Marty

Reputation: 2808

Assuming that the previous line being removed is the trigger that makes a duplicate exempt from being deleted and that you want comments ignored;

use v5.12;
use warnings;

my %lines;
my $previous_line_removed = 0;
while (<>) {
   my $original_line = $_ ;
   chomp ;
   s/\s*#.*?$// ;
   if ( $lines{$_}++ && ! $previous_line_removed ) {
        $previous_line_removed = 1 ;
   }
   else {
       print $original_line ;
       $previous_line_removed = 0 ;
   }
}
#
# when fed data above...
#
line1
line2
line3
line2 #duplicate line
line4
line5

Upvotes: 0

Sebastian
Sebastian

Reputation: 2550

my %lines;
while (<DATA>) {
    next if $lines{$_}++ and /^line2/;
    print;
}

The part /^line2/ is a regular expression which describes some data/text. See http://perldoc.perl.org/perlre.html for details.

The next line skips all lines which are duplicate and match whatever you like. You may easily negate this like ! /^line1/;

Upvotes: 0

Related Questions