Reputation: 11158
I previously only had vague awareness of character encoding issues, but answers to a question today got me thinking about it. The following provided more food for thought too:
perlunitut - Perl Unicode Tutorial
The only place that I've seen mention of stating the character encoding (e.g. use utf8;
for most of us) of our source code as a "best practice" was in the answers to the previously mentioned question.
In addition, perlunitut mentions that we should use Encode qw{encode decode};
in our "standard heading" in Perl programs. Thus it seems that another "best practice" should be to decode all input and to encode all output.
What do you think?
Upvotes: 5
Views: 253
Reputation: 240010
use utf8
actually has fairly little to do with it -- almost no one uses unicode identifiers, and a program can easily be encoding-aware without ever including UTF-8 string literals in the code.
But yes, the best wisdom that I know of for dealing with encodings is this:
The very existence of a million different character sets and a million different encodings should be a detail of the interface as much as possible. There are some things you'll still have to keep in mind -- for example different collations for different languages -- but it's an ideal to strive for anyway, and following it as far as possible should greatly reduce the number of "encoding issues" in your code.
To answer your question more directly, yes -- if you're reading textual data from outside without decoding, or sending data anywhere without encoding, there's a very good chance that you're making a mistake, and that your code will break when someone else uses it in a locale different from yours.
Upvotes: 14