Reputation: 844
We're working with a latency-critical application at 30fps
, with multiple stages in the pipeline (e.g. compression, network send, image processing, 3D calculations, texture sharing, etc).
Ordinarily we could achieve these multiple stages like so:
[Process 1][Process 2][Process 3]
---------------time------------->
However, if we can stack these processes, then it is possible that as [Process 1]
is working on the data, it is continuously passing its result to [Process 2]
. This is similar to how iostream
works in c++, i.e. "streaming". With threading, this can result in reduced latency:
[Process 1]
[Process 2]
[Process 3]
<------time------->
Let's presume that [Process 2]
is our a UDP
communication (i.e. [Process 1]
is on Computer A and [Process 3]
is on Computer B).
The output of [Process 1]
is approximately 3 MB
(i.e. typically > 300 jumbo packets at 9 KB each), therefore we can presume that when we call ZeroMQ
's:
socket->send(message); // message size is 3 MB
Then somewhere in the library or OS, the data is being split into packets which are sent in sequence. This function presumes that the message is already fully formed.
Is there a way (e.g. API) for parts of the message to be 'under construction' or 'constructed on demand' when sending large data over UDP
? And would this also be possible on the receiving side (i.e. be allowed to act on the beginning of the message, as the remainder of the message is still incoming). Or.. is the only way to manually split the data into smaller chunks ourselves?
Note:
the network connection is a straight wire GigE connection between Computers A and B.
Upvotes: 2
Views: 4259
Reputation: 1
TLDR; - simple answer is no, ZeroMQ SLOC will not help your project win. The project is doable, but another design view is needed.
Having stated a minimum set of facts:
- 30fps
,
- 3MB/frame
,
- 3-stage processing pipeline,
- private host-host GigE-interconnect,
there is not much to decide without further details.
Sure, there is a threshold of about 33.333 [ms]
for the pipeline end-to-end processing ( while you plan to lose some 30 [ms]
straight by networkIO
) and the rest you leave to designer's hands. Ouch!
I/O
design phaseZeroMQ
is a powerhorse, but that does not mean, it could save a poor design.
If you spend a few moments with timing constraints, the LAN networkIO
latency is the worst enemy in your view.
Ref.: Latency numbers everyone should know
If your code allows for a parallelised processing, your plans will get much better use of "progressive"-pipeline processing with a use of a ZeroMQ
Zero-copy / ( almost ) Zero-latency / Zero-blocking achievable in inproc:
transport class and your code may support "progressive"-pipelining as you go among multiple processing phases.
Remember, this is not a one-liner and do not expect a SLOC
to control your "progressive"-pipelining fabrication.
[ns]
matter, read your numbers from data-processing micro-benchmarks carefully.
They do decide about your success.
Here you may read how much time was "lost" / "vasted" in just changing a color-representations, which your code will need in object detection and 3D scene-processing and texture post-processing. So have your design criteria set for rather high standard levels.
Check the lower left window numbers
about milliseconds lost in this real-time pipeline.
If your code's processing requirements do not safely fit into your 33,000,000 [ns]
time-budget with { quad | hexa | octa }-core
CPU resources and if the numerical processing may benefit from many-core
GPU resources, there may be the case, that Amdahl's Law may well justify for some asynchronous multi-GPU-kernel processing approach, with their additional +21,000 ~ 23,000 [ns]
lost in initial/terminal data transfers +350 ~ 700 [ns]
introduced by GPU.gloMEM -> GPU.SM.REG
latency masking ( which happily has enough quasi-parallel thread-depth in your case of image processing, even for a low-computational density of the expected trivial GPU-kernels )
Ref.:
GPU/CPU latencies one shall validate initial design against:
Upvotes: 3
Reputation: 240404
No, you can't realistically do it. The API doesn't provide for it, and ZeroMQ promises that a receiver will get a complete message (including multi-part messages) or no message at all, which means that it won't present a message to a receiver until it's fully transferred. Splitting the data yourself into individually actionable chunks that are sent as separate ZeroMQ messages would be the right thing to do, here.
Upvotes: 4