Reputation: 509

Perl: Don't print any lines that are duplicates

What is a Perl one-liner to only print those lines that only ever appear once (that is, do not print if it appears more than once--the truly unique lines)?

For example, if I have a file that contains duplicate lines:

line1
line2
line2
line3
line1
line4
line5

The output should be:

line3
line4
line5

I can do perl -ne 'print if $a{$_}++' file to see only the lines that are duplicates...

line2
line1

I can swap the if for its antonym, unless, and see only one occurrence of each line in the file...

perl -ne 'print unless $a{$_}++' file
line1
line2
line3
line4
line5

I'm assuming I have to slurp the entire file in and process it using single \n delimiters for each line, maybe into a hash? Just not sure how to do that.

Upvotes: 1

Answers (4)

Mohima Chaudhuri

Reputation: 109

This should work although it's a Unix solution:

sort file1|uniq -u

line3
line4
line5

where file1 has:
line1
line2
line2
line3
line1
line4
line5

The -u option of the uniq command lists only non-duplicate entries, and uniq works on a sorted output

Upvotes: 1

Borodin

Reputation: 126772

As mentioned above, to filter the file in this way and keep the lines in order you need to either read through the file twice or to store line number information while you read

This one-liner seems the best option

perl -e '@a = @ARGV; ++$c{$_} while <>; @ARGV = @a; $c{$_} == 1 and print while <>;'  myfile.txt

output

line3
line4
line5

This is a slightly shorter alternative, but it uses double the amount of memory to store the file data

perl -e '@l = <>; ++$c{$_} for @l; $c{$_} == 1 and print for @l;' myfile.txt

Upvotes: 2

Kosh

Reputation: 18444

Another way to do it:

perl -e'@a=<>; $d{$_}++ for @a; print grep {$d{$_}<2} @a' file

Upvotes: 4

ikegami

Reputation: 386696

You only know if a line is unique after you've read all the lines, so you can't possibly start printing before you've reached the end of the file!

# Varying order

perl -nle'++$lines{$_}; END { print for grep $lines{$_}==1, keys %lines; }' file

# Sorted

perl -nle'++$lines{$_}; END { print for sort grep $lines{$_}==1, keys %lines; }' file

# Original order

perl -nle'
   if ( my $orig_line_num = $line_nums_by_line{$_} ) {
      $lines_by_line_num[$orig_line_num] = undef;
   } else {
      $lines_by_line_num[$.] = $_;
      $line_nums_by_line{$_} = $.;
   }

   END { print for grep defined, @lines_by_line_num; }
' file

Upvotes: 2

Perl: Don&#39;t print any lines that are duplicates

Answers (4)

output

Related Questions

Perl: Don't print any lines that are duplicates