Reputation:
I am trying to parse a text file that has the weird quotes like “ and ” into "normal quotes like "
I tried this:
text.gsub!("“",'"')
text.gsub!("”",'"')
but when it's done, they are still there and show up as
\x93 and \x94
so I tried adding that too with no luck:
text.gsub!('\\x93', '"')
text.gsub!('\\x94', '"')
The problem is, when I try to show those weird quotes on a webpage, it makes that weird diamond with a question mark symbol: �
Upvotes: 2
Views: 718
Reputation: 14881
Your first gsubs should work. The reason the second set of gsubs don't work is that you're using single quotes and double backslash. Try the other way around:
text.gsub!("\x93", '"')
text.gsub!("\x94", '"')
You can also do this in one line:
text.gsub!("\x93", '"').gsub!("\x94", '"')
# or
text.gsub!(/(\x93|\x94)/, '"')
Are you sure the encoding of the string is correct?
Upvotes: 0
Reputation: 49104
Re: the second question of why the weird quotes show on a web page as the � symbol:
Your problem is that your web page is not in UTF-8 mode. To get it there, see http://www.w3.org/International/O-HTTP-charset
If you can't change your web server, add a meta line in the head section of your web pages: http://www.utf-8.com/
Larry
Upvotes: 0
Reputation: 284826
It seems to work:
text = "“foo”"
=> "\342\200\234foo\342\200\235"
irb(main):002:0> text.gsub!("“",'"')
=> "\"foo\342\200\235"
irb(main):003:0> text.gsub!("”",'"')
=> "\"foo\""
You need to use a hex editor to figure out all the character codes involved.
Upvotes: 1