CodeMonkey JD
CodeMonkey JD

Reputation: 65

What is the simplest way to handle non-ASCII character input with javascript?

I'm working on a project which accepts user input of a name and subsequently navigates to a website to scrape data related to that name. Everything is going well, except when users input non-ASCII characters, accented characters, and Non-Western characters. I'm looking for the simplest way to store those characters in a string without having javascript convert them to a "�".

I've done some research on the issue and found similar questions to mine, but they all seem to address removing accents from characters with accent folding, rather than simply storing those characters for later use.

I am using the readline-sync Node module to simplify the process of requesting user input. If that is part of the problem, please let me know! Here is the entirety of the code from my test algorithm:

const rlSync = require('readline-sync');

const name = await rlSync.question('Enter player name (Case Sensitive): ');
console.log(name);

This is all of the code from the test algorithm where the issue arises, so I know the source is not elsewhere. The primary test case I have been using up to this point has been any name with the letter "ë", although that is not the only problematic character. When I type "Hëllo" in the input prompt, the program outputs "H�llo".

Thank you all so much for any help you can provide! <3

UPDATE based on everyone's responses and a bunch of research: I think y'all are right about the console settings being an issue, rather than the code. Does anyone have a suggestion as to a good alternative CLI that uses UTF-8, or a means of updating the settings in the Windows command prompt to do so?

My Windows version is 10.0.18362.267. I have tried setting the language to "Beta: use UTF-8" via the administrative language settings, but this seems to present another issue: Instead of printing "H�llo", the cmd printed "Hllo".

(If this is beyond the scope of this forum I totally understand... just hoping to get as much help as I can!) :-)

Upvotes: 1

Views: 2972

Answers (2)

jdmayfield
jdmayfield

Reputation: 1537

I re-read your question... I don't recall the node.js bit being there before, but....

Your issue is not in your program. It is the settings in your terminal. You need to change your terminals settings to use UTF-8 and a font capable of displaying those characters. Or switch to a terminal that can.

If your terminal only understands ASCII or is set to wrong encoding, it's showing the replacement character because it can't display them.

Node.js uses UTF-8 by default, so internally all should be well.

**Note: I checked up on readline-sync to be sure it's not the problem, and what I read seems to support this hypothesis.

https://github.com/anseki/readline-sync/issues/58

ECMAScript (Node.JS) already supports Unicode, by default. If your environment (not readlineSync) does not support those characters (e.g. you use Windows), the console.log method in your code can not print those when the answer contains those characters.


Old answer: If your seeing that symbol in place of characters, it is almost certainly a font issue rather than a javascript issue. Try using a font that supports these characters. How you do this depends on what your viewing the output with (i.e. terminal, browser, etc). If that doesn't work, you may need to specificy using utf8 as well, and also depends on the same.

Upvotes: 2

Bojoer
Bojoer

Reputation: 948

This seems an issue of your text encoding settings on your server. If stored in a DB then maybe not in UTF-8, if happens directly in node on output, reading from a file and output in console, then you must make sure to specify to use UTF-8 if reading from a file. If happening like with you using node cli and reading from console input this is your text encoding engine that doesn't support multibyte. So this is a settings issue so make sure all is in UTF-8 or even 16 since multibyte must be supported as all accents are stored that what cause they need a second memory space for the accent...

Upvotes: 0

Related Questions