Reputation: 91
I have a Perl script run in crontab that generates a file rich with duplicate entries, because on each run it rewrites information previously written.
I would use a sort -u
of file, but, I would do it at the end of the Perl script file.
10/10/2017 00:01:39:000;Sagitter
10/11/2017 00:00:01:002;Lupus
10/12/2017 00:03:14:109;Leon
10/12/2017 00:09:00:459;Sagitter
10/13/2017 01:11:03:009;Lupus
12/13/2017 04:29:00:609;Ariet
10/11/2017 00:00:01:002;Lupus
10/12/2017 00:03:14:109;Leon
...
#!/usr/bin/perl
# Libraries
use strict;
use warnings 'all';
%lines = ();
# Remove duplicate
open( TMP_GL_OUTPUT, '>', $OUTPUT_FILE ) or die $!;
while ( <TMP_GL_OUTPUT> ) {
$lines{$_}++;
}
open( OUTFILE, '>', $TMPOUTPUT_FILE ) or die $!;
print OUTFILE keys %lines;
close( OUTFILE );
close( TMP_GL_OUTPUT );
Where am I going wrong? In shell it feels shorter than in Perl.
sort -u $TMPOUTPUT_FILE > $OUTPUT_FILE
As Suggested by ikegamy user, I've do as following:
move $OUTPUT_FILE, $TMPOUTPUT_FILE; # Copy file
run [ 'sort', '-u', '--', $TMPOUTPUT_FILE ], '>', $OUTPUT_FILE; # Remove duplicate
unlink $TMPOUTPUT_FILE;
Upvotes: 2
Views: 4446
Reputation: 30971
Your code looks almost OK.
My proposition is only to chomp
each line, before you
save an element in the hash.
The reason is that e.g. the last line, not terminated
with a \n
may look just the same as one of previous lines,
but without chomp
the previous line would have contained
the terminating \n
, whereas the last - not.
The resut is that both these lines will be different keys in the hash.
Compare my example program (working, presented below) with yours, there are
no other significant differences, apart from reading from __DATA__
and
writing to the console.
In my program, for demonstration purposes, I put 2 variants of printout, one with key values (repetition counts) and another, printing just keys. In your program leave only the second printout.
use strict; use warnings; use feature qw(say);
my %lines;
while(<DATA>) {
chomp;
$lines{$_}++;
}
while(my($key, $val) = each %lines) {
printf "%-32s / %d\n", $key, $val;
}
say '========';
foreach my $key (keys %lines) {
say $key;
}
__DATA__
10/10/2017 00:01:39:000;Sagitter
10/11/2017 00:00:01:002;Lupus
10/12/2017 00:03:14:109;Leon
10/12/2017 00:09:00:459;Sagitter
10/13/2017 01:11:03:009;Lupus
12/13/2017 04:29:00:609;Ariet
10/11/2017 00:00:01:002;Lupus
10/12/2017 00:03:14:109;Leon
Your code assigns no names to $OUTPUT_FILE
and $TMPOUTPUT_FILE
,
you even didn't declare these variables, but I assume, that in your actual
code you did it.
Another detail is that %lines
should be preceded with my
,
otherwise, as you put use strict;
the compiler prints an error.
There is a quicker and shorter solution than yours.
Instead of writing lines to a hash and printing them as late as in the second step, you can do it in a single loop:
You can even write this program as a Perl one-liner:
perl -lne"print if !$lines{$_}++" input.txt
If you run the above command from the Windows cmd
, it will print the output
to the console. If you use Linux, instead of double quotes, you can use apostrophes.
You may of course redirect the output to any file, adding > output.txt
to
the above command.
The code is executed for each input line, chomped due to -l
option.
If any other details concerning Perl one-liners are not known to you, search the web.
Upvotes: 0
Reputation: 2297
List::Util is a core module.
use List::Util 'uniq';
print for uniq <>
Upvotes: 1
Reputation: 385655
I think you are asking why your Perl program is longer than your shell script.
First of all, your shell script does something completely different than your Perl program.
The Perl equivalent to
sort -u -- "$TMPOUTPUT_FILE" > "$OUTPUT_FILE"
is
use IPC::Run qw( run );
run [ 'sort', '-u', '--', $TMPOUTPUT_FILE ], '>', $OUTPUT_FILE;
(There are differences in error handling between these two.)
They're not that different in length.
This brings up the second difference. The shell specializes in executing programs, but Perl is a general purpose language. It would be surprising if it wasn't longer in Perl!
(Now try comparing the size of your Perl program to the source of sort
...)
Upvotes: 6