mastaBlasta
mastaBlasta

Reputation: 5850

String encoding length mismatch between ruby and javascript

I'm on Ruby on Rails but that's not as significant (other than how Rails encodes request parameters).

I have a textbox where the user can enter text. I send this text using XHR back to my ruby backend which does a bunch of string processing. It looks for certain keywords and then returns to the client the list of keywords it found and their start indexes in the string.

I then process the keywords and indexes in javascript to do a bunch more things.

The problem is that if the text contains non-ASCII characters, the indexes of Ruby do not match those of javascript. Javascript handles a non-compliant unicode character just as any other character, whereas Ruby converts it to various code sequences which alter the length of the string, and make indexing useless.

Any advice on how to deal with such a situation? Simple escape/unescape encode/decode won't work.

Here's an example Mary had ä little lamb

I have a keyword match in my DB for little lamb.

Ruby (after Rails parametrizing) returns a length for that string of 23, and the start index of little lamb as 12.

Javascript returns a string length of 22, and a start index of 11.

Upvotes: 0

Views: 292

Answers (2)

Niels B.
Niels B.

Reputation: 6310

I haven't tried this as I haven't used Ruby 1.8.7 ever, but perhaps mb_chars can help you.

http://api.rubyonrails.org/classes/ActiveSupport/Multibyte/Chars.html

Try running "Mary had ä little lamb".mb_chars.size

Either way, you should upgrade to Ruby 2.1, as Ruby 1.8.7 is no longer supported.

Upvotes: 1

Patrick Oscity
Patrick Oscity

Reputation: 54694

Counting visible characters instead of bytes is a change made to Ruby in version 1.9. To get the same number of bytes in Ruby, maybe you need to upgrade to 1.9.3 or higher if you haven't already:

RUBY_VERSION
#=> "1.9.3"

str = 'Mary had ä little lamb'
keyword = 'little lamb'

str.size
#=> 22

str.index(keyword)
#=> 11

Upvotes: 1

Related Questions