Paul
Paul

Reputation: 2031

Dealing with a non-ascii character in Rspec Testing

I'm using the DocSplit gem for Ruby 1.9.3 to create Unicode UTF-8 versions of word documents. To my surprise today while I was running a test on a particular piece of one of these documents I started running into character encoding inconstencies.

I have tried a number of different methods to resolve the issue which I will list below, but the best success I've had so far is to remove all non-ASCII characters. This is far from ideal, as I don't think the character's are really going to be all that problematic in the DB.

gsub(/[^[:ascii:]]/, "")

This is a sample of what my output looks like vs. what I'm expecting:

My CODES'S APOSTROPHE

My CODES’S APOSTROPHE

The second apostrophe should look squiggly. If you paste it into irb, you get the following: \U+FFE2

I tried Regexing specifically for this character and it appears to work in Rubular. As soon as I put it in my model however, I got a syntax error.

syntax error, unexpected $end, expecting ')'
raw_title = raw_title.gsub(/’/, "")

I also tried forcing the encoding to UTF-8, but everything is already in UTF-8 and this does not appear to have an effect. I tried forcing the output to US-ASCII, but I get a byte sequence error.

I also tried a few of the encoding options found in Ruby library. These basically did the same thing as the Regex.

This all comes down to that I'm trying to match output for testing purposes. Should I even be concerned about these special characters? Is there a better way to match these characters without blindly removing them?

Upvotes: 1

Views: 1839

Answers (2)

achand8238
achand8238

Reputation: 164

I tried using the above example. but even after that it kept failing. So I used iconv to convert that specfic character. THis is what I used

Iconv.conv('ASCII//IGNORE', 'UTF8', text_to_be_converted)

I tried what was given in the following link - How to get rid of non-ascii characters in ruby

Upvotes: 0

Try adding:

# encoding: utf-8

at the top of the failing rspec file. This should ensure things like:

raw_title = raw_title.gsub(/’/, "")

in your spec work.

Upvotes: 4

Related Questions