porton
porton

Reputation: 5805

Reliable Perl encoding with File::Slurp

I need to replace every occurrence of http:// with // in a file. The file may be (at least) in UTF-8, CP1251, or CP1255.

Does the following work?

use File::Slurp;
my $Text = read_file($File, binmode=>':raw');
$Text =~ s{http://}{//}gi;
write_file($File, {atomic=>1, binmode=>':raw'}, $Text);

It seems correct, but I need to be sure that the file will not be damaged whatever encoding it has. Please help me to be sure.

Upvotes: 0

Views: 641

Answers (2)

interduo
interduo

Reputation: 401

It's no longer recommended to use File::Slurp (see here).

I would recommend using Path::Tiny. It's easy to use, works with both files and directories, only uses core modules, and has slurp/spew methods specifically for uft8 and raw so you shouldn't have a problem with the encoding.

Usage:

use Path::Tiny;

my $Text = path($File)->slurp_raw;

$Text =~ s{http://}{//}gi;

path($File)->spew_raw($Text);

Update: From documentation on spew:

Writes data to a file atomically. The file is written to a temporary file in the same directory, then renamed over the original. An optional hash reference may be used to pass options. The only option is binmode, which is passed to binmode() on the handle used for writing.

spew_raw is like spew with a binmode of :unix for a fast, unbuffered, raw write.

spew_utf8 is like spew with a binmode of :unix:encoding(UTF-8) (or PerlIO::utf8_strict). If Unicode::UTF8 0.58+ is installed, a raw spew will be done instead on the data encoded with Unicode::UTF8.

Upvotes: 2

David Verdin
David Verdin

Reputation: 490

This answer won't make you sure, though I hope it can help.

I don't see any problem with your script (tested with utf8 ans iso-8859-1 without problems) though there seems to be a discussion regarding the capacity of File::slurp to correctly handle encoding : http://blogs.perl.org/users/leon_timmermans/2015/08/fileslurp-is-broken-and-wrong.html

In this answer on a similar subject, the author recommends File::Slurper as an alternative, due to better encoding handling: https://stackoverflow.com/a/206682/6193608

Upvotes: 3

Related Questions