Reputation: 173
I'm running perl in Windows and I've got some text files for which the lines in CRLF (0d0a). Problem is, there are these occasional 0a characters sprinkled around the file that are splitting lines in Windows perl and mucking with my processing. My thought is to preprocess the file, reading lines split by CRLF but, at least in Windows, it insists on splitting on LF as well.
I've tried setting $/
local $/ = 0x0d;
open(my $fh, "<", $file) or die "Unable to open $file";
while (my $line = <$fh>) {
# do something to get rid of the 0x0a embedded in the line of text;
}
...but this reads multiple lines...it seems to miss the 0x0d altogether. I've also tried setting it to "\n", "\n\r", "\r" and "\r\n". There must be a simple way to do this!
I need to get rid of the so I can correctly process the file. So, I need a script that will open the file, split the file on CRLF, find any 0a that isn't preceded by an 0d, blast it and save it, line by line, to a new file.
Thanks for any help you can provide.
Upvotes: 2
Views: 119
Reputation: 173
This solution works by reading the data in using binary mode.
open(my $INFILE, "<:raw", $infile)
or die "Can't open \"$infile\": $!\n");
open(my $OUTFILE, ">:raw", $outfile)
or die "Can't create \"$outfile\": $!\n");
my $buffer = '';
while (sysread($INFILE, $buffer, 4*1024*1024)) {
$buffer =~ s/(?<!\x0D)\x0A//g;
# Keep one char in case we cut between a CR and a LF.
print $OUTFILE substr($buffer, 0, -1, '');
}
print $OUTFILE $buffer;
Upvotes: 2
Reputation: 386331
For starters, local $/ = 0x0d;
should be local $/ = "\x0d";
.
Aside from that, the problem is that a :crlf
layer is added to file handles in Windows by default. This causes CRLF
to be converted to LF
on read (and vice-versa on write). There are therefore no CR
in what you read, so you end up reading the entire file.
Simply removing/disabling the :crlf
will do the trick.
local $/ = "\x0D\x0A";
open(my $fh, "<:raw", $file)
or die("Can't open \"$file\": $!\n");
while (<$fh>) {
chomp;
s/\x0A//g;
say;
}
Upvotes: 2