Vlad K
Vlad K

Reputation: 2851

Why protobuf doesn't support alphanumeric type?

I'm wondering why protobuf doesn't implement support of commonly used alphanumeric type? This would allow to encode several characters in only byte (more if case insensitive) very effectively as no any sort of compression involved. Is it something that Protobuf developers are planning to implement in future?

Thanks,

Upvotes: 0

Views: 271

Answers (1)

Marc Gravell
Marc Gravell

Reputation: 1063994

In today's global world, the number of times when "alphanumeric" means the 62 characters in the range 0-9, A-Z and a-z is fairly minimal. If we just consider the basic multilingual plane, there are about 48k code units (which is to say: over 70% of the available range) that are counted as "alphanumeric" - and a fairly standard (although even this may be suboptimal in some locales) way of encoding them is UTF-8, which protobuf already uses for the string type.

I cannot see much advantage in using a dedicated wire-type for this category of data, and any additional wire-type would have the issue that it would need support adding in multiple libraries, because an unknown wire-type renders the stream unreadable to down-level parsers: you cannot even skip over unwanted data if you don't know the wire-type (the wire-type defines the skip rules).

Of course, since you also have the bytes type available, you can feel free to do anything bespoke you want inside that.

Upvotes: 2

Related Questions