Reputation: 5850
I'm on Ruby on Rails but that's not as significant (other than how Rails encodes request parameters).
I have a textbox where the user can enter text. I send this text using XHR back to my ruby backend which does a bunch of string processing. It looks for certain keywords and then returns to the client the list of keywords it found and their start indexes in the string.
I then process the keywords and indexes in javascript to do a bunch more things.
The problem is that if the text contains non-ASCII characters, the indexes of Ruby do not match those of javascript. Javascript handles a non-compliant unicode character just as any other character, whereas Ruby converts it to various code sequences which alter the length of the string, and make indexing useless.
Any advice on how to deal with such a situation? Simple escape/unescape encode/decode won't work.
Here's an example
Mary had ä little lamb
I have a keyword match in my DB for little lamb
.
Ruby (after Rails parametrizing) returns a length for that string of 23, and the start index of little lamb
as 12.
Javascript returns a string length of 22, and a start index of 11.
Upvotes: 0
Views: 292
Reputation: 6310
I haven't tried this as I haven't used Ruby 1.8.7 ever, but perhaps mb_chars
can help you.
http://api.rubyonrails.org/classes/ActiveSupport/Multibyte/Chars.html
Try running "Mary had ä little lamb".mb_chars.size
Either way, you should upgrade to Ruby 2.1, as Ruby 1.8.7 is no longer supported.
Upvotes: 1
Reputation: 54694
Counting visible characters instead of bytes is a change made to Ruby in version 1.9. To get the same number of bytes in Ruby, maybe you need to upgrade to 1.9.3 or higher if you haven't already:
RUBY_VERSION
#=> "1.9.3"
str = 'Mary had ä little lamb'
keyword = 'little lamb'
str.size
#=> 22
str.index(keyword)
#=> 11
Upvotes: 1