Reputation: 13
I'm trying to write a text file encoded in UTF-8 with JavaScript.
I have to write this text file via command line, so my code is like below...
My script.js:
const text = 'this is test text';
const fs = require('fs);
fs.writeFileSync('./test.txt', text, "utf8");
My package.json:
{
"name": "test-project",
"version": "0.1.0",
"private": true,
"dependencies": {
"@babel/cli": "^7.8.4",
"@babel/core": "^7.9.0",
"@babel/plugin-transform-runtime": "^7.9.0",
"@babel/preset-env": "^7.9.5",
"@babel/preset-react": "^7.9.4",
"@babel/register": "^7.9.0",
"@testing-library/jest-dom": "^4.2.4",
"@testing-library/react": "^9.3.2",
"@testing-library/user-event": "^7.1.2",
"axios": "^0.19.2",
"bootstrap": "^4.4.1",
"glob": "^7.1.6",
"jquery": "^3.4.1",
"react": "^16.12.0",
"react-bootstrap": "^1.0.0-beta.17",
"react-dom": "^16.12.0",
"react-helmet": "^5.2.1",
"react-router-dom": "^5.1.2",
"react-router-sitemap": "^1.2.0",
"react-scripts": "3.4.0",
"react-table": "^7.0.0-rc.16",
"recharts": "^2.0.0-beta.1"
},
"scripts": {
"start": "react-scripts start",
"build": "react-scripts build",
"test": "react-scripts test",
"eject": "react-scripts eject",
"sitemap": "node src/sitemap.js"
},
"eslintConfig": {
"extends": "react-app"
},
"browserslist": {
"production": [
">0.2%",
"not dead",
"not op_mini all"
],
"development": [
"last 1 chrome version",
"last 1 firefox version",
"last 1 safari version"
]
}
}
and then run this command in terminal:
$ node script.js
$ file --mime test.txt
$ test.txt: text/plain; charset=us-ascii
Problem:
The file created by fs.writeFileSync is encoded in us-ascii, not utf-8.
How can I write file in UTF-8?
*NOTE: I'm using Japanese PC, so it might effect the encoding of file?
*NOTE2: I tried below and the result is the same...
const stream = fs.createWriteStream('.test.txt', "utf8");
stream.once('open', () => {
stream.write('this is test text');
});
Upvotes: 0
Views: 8634
Reputation: 707786
fs.writeFileSync doesn't write file in UTF-8
Actually it did. US-ASCII is a subset of UTF-8 for characters with a character code 127 and below. So, it's both US-ASCII and UTF-8.
For plain ascii characters below 127, there is no physical difference between UTF-8 and US-ASCII. US-ASCII characters encode to themselves in UTF-8.
A file like you're writing doesn't typically record what character set it is. It's up to the reading software to either infer the encoding from the data it finds or use other clues such as the file extension to guess. So, your program is just telling you that your file meets all the requirements of US-ASCII and thus looks like US-ASCII which happens to be a subset of UTF-8.
Put some Japanese characters in there and it will look different since they don't fit into US-ASCII. They will use multiple bytes to encode properly.
Upvotes: 2