File Encoding - What kind of encoding is this?

Question

I recently compiled project with maven 3.1. Just after that source codes turned into this.

��

It's sample of CSS file. I'm using IntelliJ 13.

How to turn this into human readable format?

Floris · Accepted Answer

To expand on my last comment a little bit - I copied the text of your question into a text editor and saved it. Then I viewed it with the Mac/Linux command od -cx which prints both the character, and the hex representation, when possible. For your question the first few lines gave:

od -cx junk.txt 
0000000    I       r   e   c   e   n   t   l   y       c   o   m   p   i
             2049    6572    6563    746e    796c    6320    6d6f    6970
0000020    l   e   d       p   r   o   j   e   c   t       w   i   t   h
             656c    2064    7270    6a6f    6365    2074    6977    6874
0000040        m   a   v   e   n       3   .   1   .       J   u   s   t
             6d20    7661    6e65    3320    312e    202e    754a    7473
0000060        a   f   t   e   r       t   h   a   t       s   o   u   r
             6120    7466    7265    7420    6168    2074    6f73    7275
0000100    c   e       c   o   d   e   s       t   u   r   n   e   d    
             6563    6320    646f    7365    7420    7275    656e    2064
0000120    i   n   t   o       t   h   i   s   .  
  
  
   �  **  **
             6e69    6f74    7420    6968    2e73    0a0a    ef0a    bdbf
0000140    �  **  **   �  **  **   �  **  **   �  **  **   �  **  **   �
             bfef    efbd    bdbf    bfef    efbd    bdbf    bfef    efbd

As you can see, the bytes are swapped (the first two bytes in the file are 0x49 0x20 representing "I space", but they are shown "backwards" (little endian representation). The same thing can be see with the rest of the "readable" characters; when you get to the "unreadable" characters, you find they are (in correct sequence) comprised of the bytes

0xef 0xbf 0xbd

repeated over and over again. This is the "universal replacement characters" (see for example https://stackoverflow.com/a/4391782/1967396 or http://en.wikipedia.org/wiki/Specials_%28Unicode_block%29) - used to represent a character that could not be shown (and it becomes the question-mark-in-a-diamond when displayed). Presumably, the multiple copy/paste operations from your original file to Stack Overflow caused this substitution. Looking at the original file with a binary dump I am guessing you would have seen a different character representation; maybe you could have done something with it, but maybe it is just the way that Maven squelched your file.

File Encoding - What kind of encoding is this?

Answers (1)

Related Questions