Reputation: 8846
Update: Apparently these are control characters, not Unicode characters.
I'm trying to parse an XML file which has an odd character in it that makes it invalid and is causing my tools (Firefox, Nokogiri) to complain.
Here's what the character looks like in Firefox, and what it looks like when I copy and paste it into Textmate (I'm on OS X obviously).
crazy characters http://img.skitch.com/20090811-ghu43k5u9nhpcjmh443dpq76jp.preview.jpgRather than just cryptic icons and little grey diamonds I'd really like to know what these characters are (e.g. hex/dec codes) but I'm not sure how to figure that out.
Upvotes: 2
Views: 798
Reputation: 33732
you can download the Ruby hexdump extension for class String, and print out a hexdump from Ruby directly:
require 'hexdump'
#... whatever you do in your program
puts your_string.hexdump
output looks like what you get from hexdump -C
in a shell
See:
Ruby Hexdump method for Class String
Upvotes: 0
Reputation: 41757
The search term you are looking for is U+2603
or U2603
, obviously substituting the numbers from your lamentably blurry "unknown glyph" box. The first several results will be about that Unicode character.
Upvotes: 2
Reputation: 35833
If you're using Vim, then move the cursor over the character and type ga to show the hex in the status area
Upvotes: 0
Reputation: 19782
hexdump -c from the Terminal command line will show you the character code.
Upvotes: 0
Reputation: 3851
Open the file hexeditor and extract the hexadecimal representation of the character. Then look up the code on on http://unicode.org to find out the name of the character.
Upvotes: 0
Reputation: 127467
I would save the page in Firefox to a file, and pass it to hexdump -C
. Look for the fragment of HTML around it in the ASCII part, then look for the hex bytes. Most likely, these are UTF-8, so expect a multi-byte code.
Upvotes: 5
Reputation: 29716
Your screenshot is tiny, but does the Firefox sample contain a glyph with 4 hexadecimal characters in it? If so, that's the Unicode character's code number. You could also hunt for that diamond glyph on the Unicode code charts, or simply copy the diamond into a Google search and the character name should turn up near the top.
But the real question is how to handle Unicode input in your program. You need to do that correctly if you're processing XML. Nokogiri is a Ruby library? I'm surprised to hear it doesn't handle Unicode automatically.
Upvotes: 4