yelsayed
yelsayed

Reputation: 5532

What mime types are plain text?

I'm writing a small Javascript function that gives out a preview of a file if its content can be inserted into a div.innerText, so I'm looking for all text-based mime types.

For example, I know that text/* all satisfy this requirement, but there's also application/json for example.

The expected result is knowing whether a file contains enough English language alpha-numeric characters to justify opening it as a readable text file. It must contains words that make sense to an English language user/reader (such as the, and, of etc... including JSON symbols like ] etc).

Upvotes: 5

Views: 10085

Answers (2)

Heiko Theißen
Heiko Theißen

Reputation: 16666

A while ago, Google Chrome changed the behavior of their DevTools to no longer display multipart/mixed payloads in the network trace, because their content is not guaranteed to be readable text. A bug report was quickly filed, and its outcome was:

  • The rules for determining text types (which you may want to follow, see function isTextType in MimeType.ts) were changed to allow more mime types, including all multipart types.
  • The network trace offers a side-by-side view of non-text responses in hex and (attempted) UTF-8 decoding (to be joined by a base64 view in Chrome 132, see here).

The reasoning behind this is that certain non-textual mime types still contain parts that are decodable as UTF-8, and users will value being able to read these parts, like the XML fragment at the end of the image/webp response in the example below. Compare its presentation in the div and in the Chrome DevTools.

fetch("https://httpbin.org/image/webp")
.then(r => r.text())
.then(t => d.innerText = t);
<div id="d"></div>

To answer the original question: Simply loading any response into div.innerText gives a useful preview.

Upvotes: 0

Mayank Kumar Chaudhari
Mayank Kumar Chaudhari

Reputation: 18528

You can find this info at various resources like MDN docs, Wikipedia.

However, at times mime types may be incorrectly set.

Here's an alternative approach.

function isTextFile(file) {
    return new Promise((resolve, reject) => {
        const reader = new FileReader();
        reader.onload = function (event) {
            const arrayBuffer = event.target.result;
            const uint8Array = new Uint8Array(arrayBuffer);

            // Check first 512 bytes (or entire file if smaller)
            const maxBytesToCheck = Math.min(uint8Array.length, 512);
            for (let i = 0; i < maxBytesToCheck; i++) {
                const byte = uint8Array[i];
                // Check for non-printable characters (excluding common ones like newline)
                if (byte < 32 && byte !== 9 && byte !== 10 && byte !== 13) {
                    resolve(false); // Likely a binary file
                    return;
                }
            }
            resolve(true); // Likely a text file
        };
        reader.onerror = reject;
        reader.readAsArrayBuffer(file);
    });
}

Or a hybrid function

async function isTextContent(file) {
    // Step 1: MIME type check
    const textMimeTypes = [
        "text/",
        "application/json",
        "application/javascript",
        "application/xml",
        "application/ld+json",
        "application/yaml",
        "message/"
    ];
    if (textMimeTypes.some((type) => file.type.startsWith(type))) {
        return true;
    }

    // Step 2: Content-based heuristic check
    return await isTextFile(file);
}

Upvotes: 0

Related Questions