Reputation: 452
I am using flatbuffers in Go to send an of 10000 array of floats over TCP between two ports on my local machine. I am sending the same message in a loop that only does that. Rate I achieve is only approximately 2ms per message but in C++ I achieve is approximately 140 microseconds per message. I have the following schema for my flatbuffers messages
namespace MyModel;
table Features {
data:[float32];
}
root_type Features;
and then in the Go code I have builder := flatbuffers.NewBuilder(1024)
and conn, err := net.Dial("tcp", endPoint)
then after a few other things I have in the sending loop:
builder.Reset()
MyModel.FeaturesStartDataVector(builder, nFloat32s)
for i := nFloat32s - 1; i >= 0; i-- {
builder.PrependFloat32(data[i])
}
featuresData := builder.EndVector(nFloat32s)
MyModel.FeaturesStart(builder)
MyModel.FeaturesAddData(builder, featuresData)
features := MyModel.FeaturesEnd(builder)
builder.Finish(features)
msg := builder.FinishedBytes()
msgLen := make([]byte, 4)
flatbuffers.WriteUint32(msgLen, uint32(len(msg)))
conn.Write(msgLen)
conn.Write(msg)
The number of messages received and their contents are correctly as received by a Python program. But it 14x slower than when I benchmarked using C++ sender with the data also being received by the same Python program. I am using nFloats = 100000
.
Profiling shows that PrependFloat32
is taking a lot of time.
(pprof) top5 -cum
Showing nodes accounting for 2850ms, 61.29% of 4650ms total
Dropped 5 nodes (cum <= 23.25ms)
Showing top 5 nodes out of 18
flat flat% sum% cum cum%
0 0% 0% 4600ms 98.92% main.main
550ms 11.83% 11.83% 4600ms 98.92% main.run
0 0% 11.83% 4600ms 98.92% runtime.main
1140ms 24.52% 36.34% 3640ms 78.28% github.com/google/flatbuffers/go.(*Builder).PrependFloat32
1160ms 24.95% 61.29% 1790ms 38.49% github.com/google/flatbuffers/go.(*Builder).Prep
Can I make this faster?
(Of course, for such flat data I could just use raw sockets, but later on I will more compexity to the message.)
Upvotes: 2
Views: 1468
Reputation: 2102
For anyone who is curious about the solution in the linked github code from snow_abstraction's comment, the question uses:
for i := nFloat32s - 1; i >= 0; i-- {
builder.PrependFloat32(data[i])
}
versus the linked code:
for i := nFloat32s - 1; i >= 0; i-- {
builder.PlaceFloat32(data[i])
}
PlaceFloat32 is faster because: "MyModel.FeaturesStartDataVector allocates enough space so skip the extra checks that an idiomatic call to build.PrependFloat32(data[i]) would entail.".
The flatbuffers source code confirms that PrependFloat32 calls Prep to do some alignment and sizing checks, which appear to be redundant due to the prior call to MyModel.FeaturesStartDataVector which calls StartVector which calls Prep. So, since Prep has already been called to check the boundaries of the whole array, there is no need to call it to boundary check every individual float32 written to the array.
Upvotes: 2
Reputation: 6074
What @icza says is worth trying.. beyond that, maybe Go has some kind of array copy function that can be used to add all floats at once, though for that you'd need to add some kind of CreateFloatVector
function to the builder. There is already CreateByteVector: https://github.com/google/flatbuffers/blob/521e255ad9656a213971b30ba1beeec395b2e27e/go/builder.go#L343
Upvotes: 1