Reputation: 107
I have a script, MM.pl
, which is the “workhorse”, and a simple “patchfile” that it reads from. In this case, the patch file is targeting an .ini file for search and replace. Simple enough. It took me 5 days to realize the ini is encoded with null (\0
) characters between each letter. Since then, I have tried every option I could find both in code snippets, use::
functions, and regular expressions.
The only reason I found it was I used use Data::Printer;
to dump several values. In Notepad++, the ini appears to be encoded as USC-2 LE. It is important that MM.pl
handles the task instead of asking the user to “fix” the issue.
Update: This may provide a clue \xFF\xFE are the first 2 characters in the ini file. They appear after processing. The swap is not actually changing anything else like it is supposed to, but "reveals" 2 hidden characters.
Upvotes: 1
Views: 1024
Reputation: 107
To be honest this really is not a solution but a copout. After 4 weeks of trying and retrying methods, and reading and reading and reading, I have put it in park and switched to python to build the app. Several references in the perldocs mention UTF16 is "problematic" and also in mention situations it is treated differently.
Upvotes: 0
Reputation: 8588
When you read the file set the encoding
my $fh = IO::File->open( "< something.ini" );
binmode( $fh, ":encoding(UTF-16LE)" );
And when you output, you can write back whichever enoding you like. e.g.
my $out = IO::File->open( "> something-new.ini" );
binmode( $out, ":encoding(UTF-8)" );
Or even if you're dumping to the terminal
binmode( STDOUT, ":encoding(UTF-8)" );
Upvotes: 1
Reputation: 48536
As you noticed, those nulls aren't just junk to be stripped; they're part of the file's character encoding. So decode it:
open my $fh, '<:encoding(UCS-2)', 'file.ini';
Write it back out the same way once you're done.
Upvotes: 8