Anatoly Popov
Anatoly Popov

Reputation: 13

Node.js, russian characters in the query are poorly decoded

I'm use

curl -Lk 'https://my-site.ru/?q=повар'

and in nodejs server req.url

I see ?q=водиÑелÑ

Locale character encoding UTF-8. What need to for normally decode character?

Upvotes: 0

Views: 509

Answers (1)

Amadan
Amadan

Reputation: 198418

The problem is in curl not sending the URL correctly. URL is supposed to be in URL encoding (where повар should be %D0%BF%D0%BE%D0%B2%D0%B0%D1%80) to be standards-compliant. To make curl send it in this format, you can do this (note that I have to use -G to force GET protocol, otherwise --data-encode will make it POST):

curl -GLk 'https://my-site.ru/' --data-encode 'q=повар'

Then req.query.q will be "повар".

Your curl is sending it straight as UTF-8, which is non-standard in URLs. I'm getting Ð¿Ð¾Ð²Ð°Ñ (not what you stated). The first letter, п, becomes D0 BF, which Express doesn't decode as UTF-8, but takes each as its own character: 'LATIN CAPITAL LETTER ETH' (U+00D0) and 'INVERTED QUESTION MARK' (U+00BF) - i.e. п. It is possible to decode this; in Node, the easiest way would be with utf8 package (utf8.decode(req.query.q)); however I would strongly suggest you just go with standards.

Note that when you type https://my-site.ru/?q=повар in your browser (as opposed to curl), your browser will actually correctly send https://my-site.ru/?q=%D0%BF%D0%BE%D0%B2%D0%B0%D1%80 instead.

Upvotes: 1

Related Questions