user391986
user391986

Reputation: 30886

CR vs LF perl parsing

I have a perl script which parses a text file and breaks it up per line into an array. It works fine when each line are terminated by LF but when they terminate by CR my script is not handling properly. How can I modify this line to fix this

my @allLines = split(/^/, $entireFile);

edit: My file has a mixture of lines with either ending LF or ending CR it just collapses all lines when its ending in CR

Upvotes: 6

Views: 4683

Answers (4)

brian d foy
brian d foy

Reputation: 132720

If you have mixed line endings, you can normalize them by matching a generalized line ending:

 use v5.10;

 $entireFile =~ s/\R/\n/g;

You can also open a filehandle on a string and read lines just like you would from a file:

 open my $fh, '<', \ $entireFile;
 my @lines = <$fh>;
 close $fh;

You can even open the string with the layers that cjm shows.

Upvotes: 5

cjm
cjm

Reputation: 62099

Perl can handle both CRLF and LF line-endings with the built-in :crlf PerlIO layer:

open(my $in, '<:crlf', $filename);

will automatically convert CRLF line endings to LF, and leave LF line endings unchanged. But CR-only files are the odd-man out. If you know that the file uses CR-only, then you can set $/ to "\r" and it will read line-by-line (but it won't change the CR to a LF).

If you have to deal with files of unknown line endings (or even mixed line endings in a single file), you might want to install the PerlIO::eol module. Then you can say:

open(my $in, '<:raw:eol(LF)', $filename);

and it will automatically convert CR, CRLF, or LF line endings into LF as you read the file.

Another option is to set $/ to undef, which will read the entire file in one slurp. Then split it on /\r\n?|\n/. But that assumes that the file is small enough to fit in memory.

Upvotes: 12

Michał Wojciechowski
Michał Wojciechowski

Reputation: 2490

You can probably just handle the different line endings when doing the split, e.g.:

my @allLines = split(/\r\n|\r|\n/, $entireFile);

Upvotes: 1

evil otto
evil otto

Reputation: 10582

It will automatically split the input into lines if you read with <>, but you need to you need to change $/ to \r.

$/ is the "input record separator". see perldoc perlvar for details.

There is not any way to change what a regular expression considers to be the end-of-line - it's always newline.

Upvotes: 0

Related Questions