Reputation: 287
This is my tab delimited input file
Name<tab>Street<tab>Address
This is how I want my output file to look like
Street<tab>Address<tab>Address
(yes duplicate the next two columns) My output file looks like this instead
Street<tab>Address
<tab>Address
What is going on with perl? This is my code.
open (IN, $ARGV[0]);
open (OUT, ">output.txt");
while ($line = <IN>){
chomp $line;
@line=split/\t/,$line;
$line[2]=~s/\n//g;
print OUT $line[1]."\t".$line[2]."\t".$line[2]."\n";
}
close( OUT);
Upvotes: 1
Views: 355
Reputation: 126722
If you know beforehand the origin of your data file, and know it to be a DOS-like file that terminates records with CR LF
, you can use the PerlIO
crlf
layer when you open the file. Like this
open my $in, '<:crlf', $ARGV[0] or die $!;
then all records will appear to end in just "\n"
when they are read on a Linux system.
A general solution to this problem is to install PerlIO::eol
. Then you can write
open my $in, '<:raw:eol(LF)', $ARGV[0] or die $!;
and the line ending will always be "\n"
regardless of the origin of the file, and regardless of the platform where Perl is running.
Upvotes: 2
Reputation: 10314
Another way to avoid end of line problems is to only capture the characters you're interested in:
open (IN, $ARGV[0]);
open (OUT, ">output.txt");
while (<IN>) {
print OUT "$1\t$2\t$2\n" if /^(\w+)\t\w+\t(\w+)\s*/;
}
close( OUT);
Upvotes: 0
Reputation: 126722
First of all, you should always
use strict
and use warnings
for even the most trivial programs. You will also need to declare each of your variables using my
as close as possible to their first use
use lexical file handles and the three-parameter form of open
check the success of every open
call, and die
with a string that includes $!
to show the reason for the failure
Note also that there is no need to explicitly open files named on the command line that appear in @ARGV
: you can just read from them using <>
.
As others have said, it looks like you are reading a file of DOS or Windows origin on a Linux system. Instead of using chomp
, you can remove all trailing whitespace characters from each line using s/\s+\z//
. Since CR and LF both count as "whitespace", this will remove all line terminators from each record. Beware, however, that, if trailing space is significant or if the last field may be blank, then this will also remove spaces and tabs. In that case, s/[\r\n]+\z//
is more appropriate.
This version of your program works fine.
use strict;
use warnings;
@ARGV = 'addr.txt';
open my $out, '>', 'output.txt' or die $!;
while (<>) {
s/\s+\z//;
my @fields = split /\t/;
print $out join("\t", @fields[1, 2, 2]), "\n";
}
close $out or die $!;
Upvotes: 4
Reputation: 64
Did you try to eliminate not only the "\n" but also the "\r"???
$file[2] =~ s/\r\n//g;
$file[3] =~ s/\r\n//g; # Is it the "good" one?
It could work. DOS line endings could also be "\r" (not only "\n").
Upvotes: 0