Raedwald
Raedwald

Reputation: 48684

Text encoding of Protocol Buffers string fields

If a C++ program receives a Protocol Buffers message that has a Protocol Buffers string field, which is represented by a std::string, what is the encoding of text in that field? Is it UTF-8?

Upvotes: 8

Views: 9275

Answers (1)

jpa
jpa

Reputation: 12176

Protobuf strings are always valid UTF-8 strings.

See the Language Guide:

A string must always contain UTF-8 encoded or 7-bit ASCII text.

(And ASCII is always also valid UTF-8.)

Not all protobuf implementations enforce this, but if I recall correctly, at least the Python library refuses to decode non-unicode strings.

Upvotes: 9

Related Questions