Reputation: 159
I know this should be pretty simple and the shell version is something like:
$ sort example.txt | uniq -u
in order to remove duplicate lines from a file. How would I go about doing this in Perl?
Upvotes: 0
Views: 2149
Reputation: 753615
The interesting spin on this question is the uniq -u
! I don't think the other answers I've seen tackle this; they deal with sort -u example.txt
or (somewhat wastefully) sort example.txt | uniq
.
The difference is that the -u
option eliminates all occurrences of duplicated lines, so the output is of lines that appear only once.
To tackle this, you need to know how many times each name appears, and then you need to print the names that appear just once. Assuming the list is to be read from standard input, then this code does the trick:
my %counts;
while (<>)
{
chomp;
$counts{$_}++;
}
foreach my $name (sort keys %counts)
{
print "$name\n" if $counts{$name} == 1;
}
Or, using using grep
:
my %counts;
while (<>)
{
chomp;
$counts{$_}++;
}
{
local $, = "\n";
print grep { $counts{$_} == 1 } sort keys %counts;
}
Or, if you don't need to remove the newlines (because you're only going to print the names):
my %counts;
$counts{$_}++ for (<>);
print grep { $counts{$_} == 1 } sort keys %counts;
If you do in fact want every name that appears in the input to appear in the output (but only once), then any of the other solutions will do the trick (or, with minimal adaptation, will do the trick). In fact, since the input lines will end with a newline, you can generate the answer in just two lines:
my %counts = map { $_, 1 } <>;
print sort keys %counts;
No, you can't do it in one by simply replacing %counts
in the print
line with the map
in the first line:
print sort keys map { $_, 1 } <>;
You get the error:
Type of arg 1 to keys must be hash or array (not map iterator) at ...
Upvotes: 1
Reputation: 4444
First of all, sort -u xxx.txt
would have been smarter than sort | uniq -u
.
Second, perl -ne 'print unless $seen{$_}++'
is prone to integer overflow, so a more sophisticated way of perl -ne 'if(!$seen{$_}){print;$seen{$_}=1}'
seems preferable.
Upvotes: 0
Reputation: 98388
Are you wanting to update a list of files to remove duplicate lines? Or process a list of files, ignoring duplicate lines? Or remove duplicate filenames from a list?
Assuming the latter:
my %seen;
@filenames = grep !$seen{$_}++, @filenames;
or other solutions from perldoc -q duplicate
Upvotes: 0
Reputation: 8895
or use 'uniq' sub from List::MoreUtils module after reading all the file to a list (although its not a good solution)
Upvotes: 0