Reputation: 28293
I ran into something a bit weird with the hexl-mode under Emacs (GNU Emacs 22.2.1 / Debian GNU Linux).
I had an UTF8 text file to which I wanted to append a BOM (Byte Order Mask: even though it is not recommended to append a pointless BOM to an UTF8 file, the spec clearly specify that a BOM in an UTF8 file is legal).
Here's how the file is seen by the file command:
...$ file /tmp/test.txt
/tmp/test.txt: UTF-8 Unicode English text
The following works:
open the UTF8 file (without BOM) in text mode
add three ASCII characters at the beginning of the file
close the file (<-- see, very important, I need to close the file)
M-x hexl-mode
M-x hexl-find-file (re-opening the file but this time in hexl-mode)
M-x hexl-insert-hex-string
EFBBBF
C-x C-s (saving the file)
M-x hexl-mode-exit
I then get an UTF-8 file with a BOM, as shown here by the file command:
...$ file /tmp/test.txt
/tmp/test.txt: UTF-8 Unicode (with BOM) English text
(note that the file command detects this heuristically as an UTF-8 with BOM "English text" but the file does contain a lot of Euro symbol: my point is that, before adding the BOM, it is NOT an ASCII file but already an UTF-8 file, as shown above)
However I simply cannot open the file under Emacs first then call hexl-mode then try to replace the first three characters by 0xEB 0xFF 0xBF (the BOM) and then save.
Apparently there are crazy conversion issues taking place when switching from (Text) to (Hexl) mode.
Am I missing something obvious or is converting to/from Text / Hexl a bit broken and I'm better to switch to hexl-mode first, do my hex editing then save & close the file and re-open in text mode?
Upvotes: 3
Views: 1988
Reputation: 27428
Note that an xml file with this tag will be silently converted to utf-16 big endian on saving.
<?xml version="1.0" encoding="UTF-16"?>
This would automatically make the file utf8 with bom after changing and saving it:
<?xml version="1.0" encoding="UTF-8"?>
Upvotes: 0
Reputation: 21162
If you take a look on hexl-find-file
code you will see that it calls find-file-literally
and then switch to the hexl-mode
.
From the documentation of find-file-literally
Visit file FILENAME with no conversion of any kind. Format conversion and character code conversion are both disabled,and multibyte characters are disabled in the resulting buffer.
So you may open your file with find-file-literally
add 3 characters and then switch to the hexl-mode
.
Upvotes: 3