Reputation: 215
TL;DR I have input that looks like this:
इस परीक्षण के लिए है
Something
Zürich
This data is then piped through a few programs and is ultimately inserted into a mongodb database. But by the time I query it out and try to display it on a web page it's all garbage.
I've found a lot of questions on how to encode these things but all the answers assume you want everything encoded and do not discuss how to decode it for display.
I only want the "weird" stuff encoded, so for the above I'd like to get some output like this
0x1234;0x8737;0x838784; ...
Something
Z0x8387;rich
which would store fine in a database, and would survive a vim edit or whatever else, but then when I pull it out I want it to render correctly.
So how do I do that, encode in Perl and decode in Javascript?
PS: I don't know what that string of symbols means, just found it somewhere. Sorry if it's offensive or something. Thanks!
Edit: choroba's answer is a very good start, let's see with an example of what the algorithm produces:
input: 株式会社イノ設計
output: 0x230;0x160;0x170;0x229;0x188;0x143;0x228;0x188;0x154;0x231;0x164;0x190;0x227;0x130;0x164;0x227;0x131;0x142;0x232;0x168;0x173;0x232;0x168;0x136;
Now how do I render that in Javascript? 0xNN was just an example of what I imagine the answer would be but if there's a better way by all means!
Thanks!
Upvotes: 0
Views: 82
Reputation: 241918
Here's an example that produces something similar to what you want:
#! /usr/bin/perl
use warnings;
use strict;
sub escape {
my ($in) = @_;
$in =~ s/([\x{80}-\x{ffff}])/sprintf '0x%d;', ord $1/ger
}
my $in = "Z\N{LATIN SMALL LETTER U WITH DIAERESIS}rich";
my $out = 'Z0x252;rich';
$out eq escape($in) or die escape($in) . "\n$out\n";
You seem to want decimal digits after 0x
. That's confusing as 0x
usually means hexadecimal. To get hexadecimal codes, change the sprintf template to 0x%x;
.
Also note that once someone enters 0x123;
into your data directly, the data will become corrupted.
If you use &#
instead of 0x
at the beginning of each replaced character, the browser will render the characters correctly: Zürich
renders as "Zürich".
Upvotes: 2