How to check if a Node.js `Buffer` contains valid UTF-8?

Question

I have a Buffer object and I would like to check if all of it is valid UTF-8. Ideally, I would like to get a string with said decoded UTF-8 text, too.

I tried Buffer.toString which takes an encoding argument which defaults to utf8. Unfortunately the docs say this:

If encoding is 'utf8' and a byte sequence in the input is not valid UTF-8, then each invalid byte is replaced with the replacement character U+FFFD.

That's not what I want: I rather want an exception or a boolean flag. Just checking if the resulting string contains U+FFFD is not the same as the input text could already have contained U+FFFD (just as a valid Unicode codepoint). Of course one could try counting U+FFFD in the buffer and the string and then compare, but that seems uselessly complicated and inefficient.

Is there a better way?

Константин Ван · Accepted Answer

import NodeBuffer, {Buffer} from "node:buffer";

NodeBuffer.isUtf8(input)

Added in: version 19.4.0, version 18.14.0.

input ( | | )

This function returns true if input contains only valid UTF-8-encoded data, including the case in which input is empty.

Throws if the input is a detached array buffer.

How to check if a Node.js `Buffer` contains valid UTF-8?

Answers (2)

`NodeBuffer.isUtf8(input)`

Related Questions