Reputation: 31
I have a file like below
1 B B C D B
2 K B D D K
1 B B C D B
2 K B D D K
1 B B C D B
2 K B D D K
I want the output to look like this
1 B C D
2 K B D
1 B C D
2 K B D
1 B C D
2 K B D
Sort commands don't work, so I tried writing a Perl program -
use strict;
use Data::Dumper;
my $file1 = <$ARGV[0]>;
open (IF2, "$file1") || die "Cannot open the file\n";
open (OUT, ">$file1.out") || die "Cannot open the out file\n";
my $k = 0;my $i=0;
my @line;
my $m;
my @line2;
while ( chomp($m = <IF2>) ) {
my $count = 0;
@line2 = split(/\t/,$m);#<stdin>;
my $l = length @line2;print $l;<stdin>;
for (my $x = 0; $x < $l;$x++) {
my $k = 0;
for (my $y = 0;$y < $l; $y) {
$i++;
#
if ($count == 0)
{
print OUT "\t$line2[$x]";
$count++;
}
if ($count != 0 && $x != $y)
{
if ($line2[$x] eq $line2[$y])
{
$k++;
}
}
}
if ($k == 0)
{
print OUT "\t$line2[$x]";
}
}
print OUT "\n";
}
print $i;
close IF2;
close OUT;
But it didn't work. Can some one help?
Upvotes: 3
Views: 361
Reputation: 66964
Note The input and output examples in the question were edited: now the data is consistent with the title, asking to remove all duplicates. (See the original question for how it looked.) I am leaving the answer to the original question as well, until we hear from the poster.
I would like to make a general remark first.
Your posted code is an honest attempt to write a C-style program in Perl. I recommend to resist that and to learn how to use Perl instead. It does take a little investment of time and effort but it goes quick and pays off quick. As you can see below, it makes many things incomparably easier.
Even so, there are issues with the posted code but I can't write a code review now.
Edited question
We need to remove all duplicates from each line. One way: split the line by whitespace then remove duplicates from the list, which is a standard task with ready solutions.
use warnings;
use strict;
use List::MoreUtils qw(uniq);
my $file = '...';
my $fileout = '...';
open my $fh, '<', $filen or die "Can't open $file: $!";
open my $fh_out, '>', $fileout or die "Can't open $fileout: $!";
while (<$fh>)
{
my @unique = uniq split;
print $fh_out "@unique\n";
}
close $fh;
close $fh_out;
The line with spaces is restored by printing the quoted array, when spaces (or whatever is in $"
) are inserted between elements . The other option is to join the resulting list
my $res = join ' ', uniq split;
print $fh_out $res, "\n";
or just print $fh_out join(' ', uniq split), "\n";
.
This uses uniq
from List::MoreUtils module. Note from uniq
that
The order of elements in the returned list is the same as in LIST.
Once List::MoreUtils
got mentioned, take note of the core module List::Util as well.
A one-liner version
perl -MList::MoreUtils=uniq -lne'print join " ", uniq split' input > output
or
perl -MList::MoreUtils=uniq -lane'print join " ", uniq @F' input > output
See Command switches in perlrun
Original question (see it in the edit history)
I was answering the question with the belief that this was the input
1 B B C D B 2 K B D D K 1 B B C D B 2 K B D D K 1 B B C D B 2 K B D D K
and this was the desired output
1 B C D 2 K B D 1 B C D 2 K B D 1 B C D 2 K B D
By your desired output you want to remove only adjacent duplicates (not get "uniq" as the title says).
For that you can use the ability of regex to match repeated patterns, by using backreferences. First we need to strip all spaces, and we'll put them back in the end. For example
use warnings;
use strict;
my $file = '...';
my $fileout = '...';
open my $fh, '<', $filen or die "Can't open $file: $!";
open my $fh_out, '>', $fileout or die "Can't open $fileout: $!";
while (my $line = <$fh>)
{
$line =~ s/\s*//g; # remove spaces /
$line =~ s/(.)\1+/$1/g; # remove adjacent duplicates
$line =~ s/(.)/$1 /g; # restore space
print $fh_out $line;
}
close $fh;
close $fh_out;
The .
matches any character, replace it with something more restrictive if needed (for example \w
, for a "word" character). See perlretut. Note that we cannot restore space along with replacement (like s/(.)\1+/$1 /g
) since non-repeating chars aren't matched and don't get the space back.
This can be done in more concise ways.
A one-liner version
perl -pe's/\s*//g; s/(.)\1+/$1/g; s/(.)/$1 /g' input > output
See Command switches in perlrun
Upvotes: 4
Reputation: 126772
I suggest something like this. It finds all occurrences of whitespace followed by non-whitespace, and checks whether the non-whitespace has been seen before on the current line. The matched substring is deleted if the non-whitespace has been seen before, otherwise it is left untouched
use strict;
use warnings 'all';
while ( <DATA> ) {
my %seen;
s/(\s+(\S+))/ $seen{$2}++ ? '' : $1 /eg;
print;
}
__DATA__
1 B B C D B
2 K B D D K
1 B B C D B
2 K B D D K
1 B B C D B
2 K B D D K
1 B C D
2 K B D
1 B C D
2 K B D
1 B C D
2 K B D
This may be done in a one-liner like this
perl -pe 'my %s; s/(\s+(\S+))/ $s{$2}++ ? "" : $1 /eg' myfile
Upvotes: 2