IMPERATOR
IMPERATOR

Reputation: 287

Perl incorrectly adding newline characters?

This is my tab delimited input file

Name<tab>Street<tab>Address

This is how I want my output file to look like

Street<tab>Address<tab>Address

(yes duplicate the next two columns) My output file looks like this instead

Street<tab>Address
         <tab>Address

What is going on with perl? This is my code.

open (IN, $ARGV[0]);

open (OUT, ">output.txt");
while ($line = <IN>){

    chomp $line;
    @line=split/\t/,$line;

    $line[2]=~s/\n//g;
   print OUT $line[1]."\t".$line[2]."\t".$line[2]."\n";
}

close( OUT);

Upvotes: 1

Views: 355

Answers (4)

Borodin
Borodin

Reputation: 126722

If you know beforehand the origin of your data file, and know it to be a DOS-like file that terminates records with CR LF, you can use the PerlIO crlf layer when you open the file. Like this

open my $in, '<:crlf', $ARGV[0] or die $!;

then all records will appear to end in just "\n" when they are read on a Linux system.

A general solution to this problem is to install PerlIO::eol. Then you can write

open my $in, '<:raw:eol(LF)', $ARGV[0] or die $!;

and the line ending will always be "\n" regardless of the origin of the file, and regardless of the platform where Perl is running.

Upvotes: 2

Cole Tierney
Cole Tierney

Reputation: 10314

Another way to avoid end of line problems is to only capture the characters you're interested in:

open (IN, $ARGV[0]);

open (OUT, ">output.txt");
while (<IN>) {
    print OUT "$1\t$2\t$2\n" if /^(\w+)\t\w+\t(\w+)\s*/;
}

close( OUT);

Upvotes: 0

Borodin
Borodin

Reputation: 126722

First of all, you should always

  • use strict and use warnings for even the most trivial programs. You will also need to declare each of your variables using my as close as possible to their first use

  • use lexical file handles and the three-parameter form of open

  • check the success of every open call, and die with a string that includes $! to show the reason for the failure

Note also that there is no need to explicitly open files named on the command line that appear in @ARGV: you can just read from them using <>.

As others have said, it looks like you are reading a file of DOS or Windows origin on a Linux system. Instead of using chomp, you can remove all trailing whitespace characters from each line using s/\s+\z//. Since CR and LF both count as "whitespace", this will remove all line terminators from each record. Beware, however, that, if trailing space is significant or if the last field may be blank, then this will also remove spaces and tabs. In that case, s/[\r\n]+\z// is more appropriate.

This version of your program works fine.

use strict;
use warnings;

@ARGV = 'addr.txt';

open my $out, '>', 'output.txt' or die $!;

while (<>) {
  s/\s+\z//;
  my @fields = split /\t/;
  print $out join("\t", @fields[1, 2, 2]), "\n";
}

close $out or die $!;

Upvotes: 4

Jose
Jose

Reputation: 64

Did you try to eliminate not only the "\n" but also the "\r"???

$file[2] =~ s/\r\n//g;
$file[3] =~ s/\r\n//g; # Is it the "good" one?

It could work. DOS line endings could also be "\r" (not only "\n").

Upvotes: 0

Related Questions