Reputation: 88946
I have a Buffer
object and I would like to check if all of it is valid UTF-8. Ideally, I would like to get a string
with said decoded UTF-8 text, too.
I tried Buffer.toString
which takes an encoding
argument which defaults to utf8
. Unfortunately the docs say this:
If
encoding
is'utf8'
and a byte sequence in the input is not valid UTF-8, then each invalid byte is replaced with the replacement characterU+FFFD
.
That's not what I want: I rather want an exception or a boolean flag. Just checking if the resulting string contains U+FFFD
is not the same as the input text could already have contained U+FFFD
(just as a valid Unicode codepoint). Of course one could try counting U+FFFD
in the buffer and the string and then compare, but that seems uselessly complicated and inefficient.
Is there a better way?
Upvotes: 4
Views: 2603
Reputation: 14819
import NodeBuffer, {Buffer} from "node:buffer";
NodeBuffer.isUtf8(input)
- Added in: version 19.4.0, version 18.14.0.
input
(<Buffer>
|<ArrayBuffer>
|<TypedArray>
)This function returns
true
if input contains only valid UTF-8-encoded data, including the case in whichinput
is empty.Throws if the
input
is a detached array buffer.
Upvotes: 5
Reputation: 88946
You can use TextDecoder
from util
. To get an exception, set the fatal
flag to true
.
new TextDecoder("utf8", { fatal: true }).decode(buffer)
For example:
> new TextDecoder("utf8", { fatal: true }).decode(Buffer.from([72, 195, 182, 240, 159, 146, 154, 215, 169, 214, 184, 215, 129]))
'Hö💚שָׁ'
> new TextDecoder("utf8", { fatal: true }).decode(Buffer.from([1, 2, 255, 3, 5]))
Uncaught:
TypeError [ERR_ENCODING_INVALID_ENCODED_DATA]: The encoded data was not valid for encoding utf-8
at __node_internal_captureLargerStackTrace (node:internal/errors:478:5)
at new NodeError (node:internal/errors:387:5)
at TextDecoder.decode (node:internal/encoding:433:15) {
errno: 12,
code: 'ERR_ENCODING_INVALID_ENCODED_DATA'
Upvotes: 3