String encoding length mismatch between ruby and javascript

Question

I'm on Ruby on Rails but that's not as significant (other than how Rails encodes request parameters).

I have a textbox where the user can enter text. I send this text using XHR back to my ruby backend which does a bunch of string processing. It looks for certain keywords and then returns to the client the list of keywords it found and their start indexes in the string.

I then process the keywords and indexes in javascript to do a bunch more things.

The problem is that if the text contains non-ASCII characters, the indexes of Ruby do not match those of javascript. Javascript handles a non-compliant unicode character just as any other character, whereas Ruby converts it to various code sequences which alter the length of the string, and make indexing useless.

Any advice on how to deal with such a situation? Simple escape/unescape encode/decode won't work.

Here's an example Mary had ä little lamb

I have a keyword match in my DB for little lamb.

Ruby (after Rails parametrizing) returns a length for that string of 23, and the start index of little lamb as 12.

Javascript returns a string length of 22, and a start index of 11.

Niels B. · Accepted Answer

I haven't tried this as I haven't used Ruby 1.8.7 ever, but perhaps mb_chars can help you.

http://api.rubyonrails.org/classes/ActiveSupport/Multibyte/Chars.html

Try running "Mary had ä little lamb".mb_chars.size

Either way, you should upgrade to Ruby 2.1, as Ruby 1.8.7 is no longer supported.

String encoding length mismatch between ruby and javascript

Answers (2)

Related Questions