Reputation: 10246
FlatBuffers specifically avoids certain encoding standardization/guarantees. Per the documentation:
(https://google.github.io/flatbuffers/flatbuffers_internals.html)
This may mean two different implementations may produce different binaries given the same input values, and this is perfectly valid.
Okay, but are strings encoded on disk directly from whatever representation they were given or encoded in some other representation? Are encode operations using the same FlatBuffers version and generated code deterministic (N operations with identical parameters produce identical results)?
What about sizing? Will a reduction in the size of dynamic structures (e.g. vectors, string values being made to be shorter) produce a corresponding reduction in the size of the encoded structure?
I really don't understand how the string encoding works and I don't have time at the moment to take apart the inner code.
I created a sample definition that has a general parent->child->grandchild structure, where the parent type has a vector of the child type, and the grandchild type embeds a string as well as a struct. I wanted to exaggerate any entropy that the different types of values might introduce to the output size by including several of them. I then populated the string value in the grandchild with a five-rune string multiplied by fifty, and iteratively decremented the multiplier by one, by hand, and printed the output size of the final encoding every time:
$ go run main.go
String size: (250)
Output encoding size: (400)
$ go run main.go
String size: (245)
Output encoding size: (400)
$ go run main.go
String size: (240)
Output encoding size: (392)
$ go run main.go
String size: (235)
Output encoding size: (384)
Why does the output size of the encoding not change after I drop five bytes from the original string value? Why does it shrink by eight bytes for every five bytes I drop from the original string? Since these are strings, I do not think alignment would play a factor here.
I still have the questions above, but it seems like it might be a safe assumption (er, guarantee) that 1) the size of the encoding is stable for the same arguments, and 2) will shrink along with the reduction in the size of one or more values within it. Is this a true statement?
Thanks for saving me some time and error in not having to hack through this on my own at the moment (hopefully).
For reference, this is the definition:
namespace testformat;
struct Vector {
field9:ulong;
field10:ulong;
field11:ulong;
}
table Grandchild {
field5:ulong;
string6:string;
field7:ulong;
field8:Vector;
}
table Child {
field3:ulong;
field4:ulong;
grandchild:Grandchild;
}
table Parent {
field1:ulong;
field2:ulong;
children:[Child];
}
root_type Parent;
This is the part of the Go code with the repeated string value that I change (at the top):
stringValue := strings.Repeat("strin", 50)
fmt.Printf("String size: (%d)\n", len(stringValue))
stringOffset := b.CreateString(stringValue)
testformat.GrandchildStart(b)
testformat.GrandchildAddField5(b, 44)
testformat.GrandchildAddString6(b, stringOffset)
testformat.GrandchildAddField7(b, 55)
vectorOffset := testformat.CreateVector(b, 11, 22, 33)
testformat.GrandchildAddField8(b, vectorOffset)
grandchildOffset := testformat.GrandchildEnd(b)
Upvotes: 1
Views: 1499
Reputation: 6074
This contains many questions, so here are some answers:
double
or long
or whatever, the size of the buffer will only ever change in 8-byte increments. A string is 4-byte aligned for its size, but the actual string data is only 1-byte aligned, so it is possible to add characters to a string without a change in buffer size, since it is simply using more of what were previously padding bytes. Even though string bytes don't need alignment, adjacent data may.Upvotes: 2