Reputation: 581
Recently I try to use FlatBuffers in C++. I found FlatBuffers seems to use a lot of type punning with things like reinterpret_cast in C++. This make me a little uncomfortable because I've learned it's undefined behavior in many cases.
e.g. Rect
in fbs file:
struct Rect {
left:int;
top:int;
right:int;
bottom:int;
}
turns into this C++ code for reading it from a table:
const xxxxx::Rect *position() const {
return GetStruct<const xxxxx::Rect *>(VT_POSITION);
}
and the definition of GetStruct simply uses reinterpret_cast.
My questions are:
Update:
The buffer can just came from network or disk. I don't know if it's different if the buffer actually came from same memory written by writer of the same C++ program.
But the writer's auto-generated method is:
void add_position(const xxxxx::Rect *position) {
fbb_.AddStruct(Char::VT_POSITION, position);
}
which will use this method and this method and so use reinterpret_cast also.
Upvotes: 1
Views: 815
Reputation: 29962
I didn't analyze the whole FlatBuffers' source code, but I didn't see where these objects are created: I see no new expression, which would create P
objects here:
template<typename P> P GetStruct(voffset_t field) const {
auto field_offset = GetOptionalFieldOffset(field);
auto p = const_cast<uint8_t *>(data_ + field_offset);
return field_offset ? reinterpret_cast<P>(p) : nullptr;
}
So, it seems that this code does have undefined behavior.
However, this is only true for C++17 (or pre). In C++20, there will be implicit-lifetime objects (for example, scalar types, aggregates are implicit-lifetime types). If P
has implicit lifetime, then this code can be well defined. Provided that the same memory area are always accessed by a type, which doesn't violate type-punning rules (for example, it always accessed by the same type).
Upvotes: 3
Reputation: 30850
I think both your questions are answered by the Flatbuffers: Use in C++ page:
Direct memory access
As you can see from the above examples, all elements in a buffer are accessed through generated accessors. This is because everything is stored in little endian format on all platforms (the accessor performs a swap operation on big endian machines), and also because the layout of things is generally not known to the user.
For structs, layout is deterministic and guaranteed to be the same across platforms (scalars are aligned to their own size, and structs themselves to their largest member), and you are allowed to access this memory directly by using sizeof() and memcpy on the pointer to a struct, or even an array of structs.
These paragraphs guarantee that – given a valid flatbuffer – all memory accesses are valid, as the memory at that specific location will match the expected layout.
If you are processing untrusted flatbuffers, you first need to use the verifier functions to ensure the flatbuffer is valid:
This verifier will check all offsets, all sizes of fields, and null termination of strings to ensure that when a buffer is accessed, all reads will end up inside the buffer.
Upvotes: 0