Don Wool
Don Wool

Reputation: 267

Rogue Character in tab delimited file causing error

I am trying to read and parse a file line by line, but there is some kind of delimiter at the end of the file that is causing strange behavior.

Here is what the lines of the file I am reading looks like :

20111129        AMEX    BHO     OTCBB   BHODD
20111129        AMEX    LCAPA   NASDAQ  LMCA

The code to read it is straightforward :

my(@line) = <INFO>;
foreach $line(@line) {
    chomp( $line );
    my @vals = split('\t', $line);

    my $date = $vals[0];
    my $old_exch = $vals[1];
    my $old_symb = $vals[2];
    my $new_exch = $vals[3];
    my $new_symb = $vals[4];

    print "0> date '$date'\n";
    print "1> old Exch '$old_exch'\n";
    print "2> old symb '$old_symb'\n";
    print "3> new Exch '$new_exch'\n";
    print "4> new symb '$new_symb'\n";

The output appears like this :

 0> date '20111129'
 1> old Exch 'AMEX'
 2> old symb 'BHO'
 3> new Exch 'OTCBB'
 '> new symb 'BHODD

so there appears to be a character at the end of each line that is causing the trailing ' to print at the beginning of the line, wiping out the 4 that should print there. it is like a character that resets where printing should be occurring back to the begining of the line. Is there any way to 'chomp out' this rogue character? or perhaps there is some kind of bug in my code, but I have other scripts doing something similar...

Thanks much In Advance!

Don

Upvotes: 0

Views: 163

Answers (1)

choroba
choroba

Reputation: 241828

The file has Windows line endings. The rogue character is "\r", you can remove it by a regular expression:

s/\r//;

Or, you can specify the :crlf layer when opening the file.

Upvotes: 4

Related Questions