Reputation: 5532
I'm writing a small Javascript function that gives out a preview of a file if its content can be inserted into a div.innerText
, so I'm looking for all text-based mime types.
For example, I know that text/*
all satisfy this requirement, but there's also application/json
for example.
The expected result is knowing whether a file contains enough English language alpha-numeric characters to justify opening it as a readable text file. It must contains words that make sense to an English language user/reader (such as the
, and
, of
etc... including JSON symbols like ]
etc).
Upvotes: 5
Views: 10085
Reputation: 16666
A while ago, Google Chrome changed the behavior of their DevTools to no longer display multipart/mixed
payloads in the network trace, because their content is not guaranteed to be readable text. A bug report was quickly filed, and its outcome was:
isTextType
in MimeType.ts
) were changed to allow more mime types, including all multipart
types.The reasoning behind this is that certain non-textual mime types still contain parts that are decodable as UTF-8, and users will value being able to read these parts, like the XML fragment at the end of the image/webp
response in the example below. Compare its presentation in the div
and in the Chrome DevTools.
fetch("https://httpbin.org/image/webp")
.then(r => r.text())
.then(t => d.innerText = t);
<div id="d"></div>
To answer the original question: Simply loading any response into div.innerText
gives a useful preview.
Upvotes: 0
Reputation: 18528
You can find this info at various resources like MDN docs, Wikipedia.
However, at times mime types may be incorrectly set.
Here's an alternative approach.
function isTextFile(file) {
return new Promise((resolve, reject) => {
const reader = new FileReader();
reader.onload = function (event) {
const arrayBuffer = event.target.result;
const uint8Array = new Uint8Array(arrayBuffer);
// Check first 512 bytes (or entire file if smaller)
const maxBytesToCheck = Math.min(uint8Array.length, 512);
for (let i = 0; i < maxBytesToCheck; i++) {
const byte = uint8Array[i];
// Check for non-printable characters (excluding common ones like newline)
if (byte < 32 && byte !== 9 && byte !== 10 && byte !== 13) {
resolve(false); // Likely a binary file
return;
}
}
resolve(true); // Likely a text file
};
reader.onerror = reject;
reader.readAsArrayBuffer(file);
});
}
Or a hybrid function
async function isTextContent(file) {
// Step 1: MIME type check
const textMimeTypes = [
"text/",
"application/json",
"application/javascript",
"application/xml",
"application/ld+json",
"application/yaml",
"message/"
];
if (textMimeTypes.some((type) => file.type.startsWith(type))) {
return true;
}
// Step 2: Content-based heuristic check
return await isTextFile(file);
}
Upvotes: 0