Reputation: 5532

What mime types are plain text?

I'm writing a small Javascript function that gives out a preview of a file if its content can be inserted into a div.innerText, so I'm looking for all text-based mime types.

For example, I know that text/* all satisfy this requirement, but there's also application/json for example.

The expected result is knowing whether a file contains enough English language alpha-numeric characters to justify opening it as a readable text file. It must contains words that make sense to an English language user/reader (such as the, and, of etc... including JSON symbols like ] etc).

Upvotes: 5

Answers (2)

Heiko Theißen

Reputation: 16666

A while ago, Google Chrome changed the behavior of their DevTools to no longer display multipart/mixed payloads in the network trace, because their content is not guaranteed to be readable text. A bug report was quickly filed, and its outcome was:

The rules for determining text types (which you may want to follow, see function isTextType in MimeType.ts) were changed to allow more mime types, including all multipart types.
The network trace offers a side-by-side view of non-text responses in hex and (attempted) UTF-8 decoding (to be joined by a base64 view in Chrome 132, see here).

The reasoning behind this is that certain non-textual mime types still contain parts that are decodable as UTF-8, and users will value being able to read these parts, like the XML fragment at the end of the image/webp response in the example below. Compare its presentation in the div and in the Chrome DevTools.

fetch("https://httpbin.org/image/webp")
.then(r => r.text())
.then(t => d.innerText = t);

<div id="d"></div>

To answer the original question: Simply loading any response into div.innerText gives a useful preview.

Upvotes: 0

Mayank Kumar Chaudhari

Reputation: 18528

You can find this info at various resources like MDN docs, Wikipedia.

However, at times mime types may be incorrectly set.

Here's an alternative approach.

function isTextFile(file) {
    return new Promise((resolve, reject) => {
        const reader = new FileReader();
        reader.onload = function (event) {
            const arrayBuffer = event.target.result;
            const uint8Array = new Uint8Array(arrayBuffer);

            // Check first 512 bytes (or entire file if smaller)
            const maxBytesToCheck = Math.min(uint8Array.length, 512);
            for (let i = 0; i < maxBytesToCheck; i++) {
                const byte = uint8Array[i];
                // Check for non-printable characters (excluding common ones like newline)
                if (byte < 32 && byte !== 9 && byte !== 10 && byte !== 13) {
                    resolve(false); // Likely a binary file
                    return;
                }
            }
            resolve(true); // Likely a text file
        };
        reader.onerror = reject;
        reader.readAsArrayBuffer(file);
    });
}

Or a hybrid function

async function isTextContent(file) {
    // Step 1: MIME type check
    const textMimeTypes = [
        "text/",
        "application/json",
        "application/javascript",
        "application/xml",
        "application/ld+json",
        "application/yaml",
        "message/"
    ];
    if (textMimeTypes.some((type) => file.type.startsWith(type))) {
        return true;
    }

    // Step 2: Content-based heuristic check
    return await isTextFile(file);
}

Upvotes: 0

What mime types are plain text?

Answers (2)

Related Questions