Reputation: 8846

How do I figure out what this character is?

Update: Apparently these are control characters, not Unicode characters.

I'm trying to parse an XML file which has an odd character in it that makes it invalid and is causing my tools (Firefox, Nokogiri) to complain.

Here's what the character looks like in Firefox, and what it looks like when I copy and paste it into Textmate (I'm on OS X obviously).

crazy characters http://img.skitch.com/20090811-ghu43k5u9nhpcjmh443dpq76jp.preview.jpg

Rather than just cryptic icons and little grey diamonds I'd really like to know what these characters are (e.g. hex/dec codes) but I'm not sure how to figure that out.

Upvotes: 2

Answers (10)

OscarRyz

Reputation: 199224

Save file and then from the terminal use:

od ( octal dump )

Upvotes: 0

Tilo

Reputation: 33732

you can download the Ruby hexdump extension for class String, and print out a hexdump from Ruby directly:

require 'hexdump'

#... whatever you do in your program

puts your_string.hexdump

output looks like what you get from hexdump -C in a shell

See:

Ruby Hexdump method for Class String

Upvotes: 0

joeforker

Reputation: 41757

The search term you are looking for is U+2603 or U2603, obviously substituting the numbers from your lamentably blurry "unknown glyph" box. The first several results will be about that Unicode character.

Upvotes: 2

Rhubarb

Reputation: 35833

If you're using Vim, then move the cursor over the character and type ga to show the hex in the status area

Upvotes: 0

Michael Speer

Reputation: 4952

Copy it into emacs and start hexl-mode.

Upvotes: 1

Mark Bessey

Reputation: 19782

hexdump -c from the Terminal command line will show you the character code.

Upvotes: 0

sebasgo

Reputation: 3851

Open the file hexeditor and extract the hexadecimal representation of the character. Then look up the code on on http://unicode.org to find out the name of the character.

Upvotes: 0

Martin v. Löwis

Reputation: 127467

I would save the page in Firefox to a file, and pass it to hexdump -C. Look for the fragment of HTML around it in the ASCII part, then look for the hex bytes. Most likely, these are UTF-8, so expect a multi-byte code.

Upvotes: 5

h0b0

Reputation: 1869

Simply open the file using a hexeditor like xvi32.

Upvotes: 0

Nelson

Reputation: 29716

Your screenshot is tiny, but does the Firefox sample contain a glyph with 4 hexadecimal characters in it? If so, that's the Unicode character's code number. You could also hunt for that diamond glyph on the Unicode code charts, or simply copy the diamond into a Google search and the character name should turn up near the top.

But the real question is how to handle Unicode input in your program. You need to do that correctly if you're processing XML. Nokogiri is a Ruby library? I'm surprised to hear it doesn't handle Unicode automatically.

Upvotes: 4

How do I figure out what this character is?

Answers (10)

Related Questions