Reputation: 2329
I need to offer a plain text file for download. The text file needs to be UTF-8 encoded and needs a BOM to be present. I saved my php file as UTF-8 without BOM and send the following headers:
header('HTTP/1.1 200 OK');
header('Content-Type: text/plain; charset=utf-8');
header('Content-Disposition: attachment; filename="test.txt"');
I save the script without BOM because it would interfere with sending the headers. So I tried putting a BOM manually by:
echo chr(239).chr(187).chr(191);
Then I put out my text. Without The manual BOM an editor like Notepad++ will recognize the file to be ANSI encoded, with the supposed manual BOM it will be recognized as UTF-8 but will contain the characters:

at the start. So I assume it is detected to be UTF-8 by means of heuristics and my manual BOM is wrong.
How do I do it right?
EDIT: HEX contents as requested. I simply made the text "SOME TEXT" and I get:
C3 AF C2 BB C2 BF 53 4F 4D 45 20 54 45 58 54
Saving "SOME TEXT" as UTF-8 with BOM yields:
EF BB BF 53 4F 4D 45 20 54 45 58 54
Upvotes: 7
Views: 18687
Reputation: 36214
Check your mbstring extension's settings (it can be set up to auto encode output)
; This directive specifies the regex pattern of content types for which mb_output_handler()
; is activated.
; Default: mbstring.http_output_conv_mimetype=^(text/|application/xhtml\+xml)
; mbstring.http_output_conv_mimetype=
Both "\xEF\xBB\xBF"
& chr(239).chr(187).chr(191)
can be used to generate BOM, you can try these with file_put_contents()
on your own.
Upvotes: 0
Reputation: 346260
What you're seeing is the result of interpreting the individual bytes of the BOM as IOS-8859-1 and then encoding the result in UTF-8. As for why this happens, I suspect the chr()
function - try using char literals instead, i.e.
echo "\xEF\xBB\xBF";
Upvotes: 1