Reputation: 133
I want to read user's file and gave him modified version of this file. I use input with type file to get text file, but how I can get charset of loaded file, because in different cases it can be various... Uploaded file has format .txt or something similar and isn't .html :)
var handler = document.getElementById('handler');
var reader = new FileReader();
handler.addEventListener('click', function() {
reader.readAsText(firstSub.files[0], /* Here I need use a correctly charset */);
});
reader.addEventListener("loadend", function() {
console.dir(reader.result.split('\n'));
});
Upvotes: 5
Views: 11405
Reputation: 2485
The other solutions didn't work for what I was trying to do, so I decided to create my own module that can detect the charset and language of text files.
You load it via the <script>
tag and then use the languageEncoding
function to retrieve the charset/encoding:
// index.html
<script src="https://unpkg.com/detect-file-encoding-and-language/umd/language-encoding.min.js"></script>
// app.js
languageEncoding(file).then(fileInfo => console.log(fileInfo));
// Possible result: { language: english, encoding: UTF-8, confidence: { language: 0.96, encoding: 1 } }
For a more complete example/instructions check out this part of the documentation!
Upvotes: 1
Reputation: 859
In my case (I made a small web app that accepts subtitle .srt files and removes time codes and line breaks, making a printable text), it was enough to foresee 2 types of encoding: UTF-8 and CP1251 (in all cases I tried – with both Latin and Cyrillic letters – these two types are enough). At first I try encoding with UTF-8, and if it is not successful, some characters are replaced by '�'-signs. So, I check the result for presence of these signs, and, if found, the procedure is repeated with CP1251 encoding. So, here is my code:
function onFileInputChange(inputDomElement, utf8 = true) {
const file = inputDomElement.files[0];
const reader = new FileReader();
reader.readAsText(file, utf8 ? 'UTF-8' : 'CP1251');
reader.onload = () => {
const result = reader.result;
if (utf8 && result.includes('�')) {
onFileInputChange(inputDomElement, false);
console.log('The file encoding is not utf-8! Trying CP1251...');
} else {
document.querySelector('#textarea1').value = file.name.replace(/\.(srt|txt)$/, '').replace(/_+/g, '\ ').toUpperCase() + '\n' + result;
}
}
}
Upvotes: 7
Reputation: 1146
You should check out this library encoding.js
They also have a working demo. I would suggest you first try it out with the files that you'll typically work with to see if it detects the encoding correctly and then use the library in your project.
Upvotes: 4