Danny
Danny

Reputation: 2683

C++ Container for high performance FIFO

I need to optimize some legacy code and am fairly new to C++.

The code does network packet processing in two threads, one thread pushing packets to a FIFO [topupBuffer] and the other thread reading from the queue and sending them out an IP socket [writeToIPOutput]. The legacy code uses std::deque to implement the FIFO.

However, running the program uses a lot of CPU, up to 50% (where it needs to be more like 5%). Running gprof seems to reveal std::deque is the culprit. (I am not sure I am interpreting the profile results correctly, so help is appreciated)

Excepts from the profile output: topupBuffer hierarchy:

index % time    self  children    called     name
                0.65    2.51       1/1           DvIPFilePlayback::topupBufferThreadMethod(void*) [2]
[1]     60.5    0.65    2.51       1         DvIPFilePlayback::topupBuffer() [1]
                0.27    1.15 4025575/4025575     DvIPPlaybackBC::bufferizeTsPackets(TPlaybackBuffer&, int&, int&) [5]
                0.03    0.56 4026668/4026668     std::deque<TTsPacket, std::allocator<TTsPacket> >::push_back(TTsPacket const&) [6]
                0.03    0.15 4046539/5749754     std::deque<TPlaybackBuffer, std::allocator<TPlaybackBuffer> >::size() const [17]

and

[5]     27.2    0.27    1.15 4025575         DvIPPlaybackBC::bufferizeTsPackets(TPlaybackBuffer&, int&, int&) [5]
                0.04    0.30 4031674/4031674     std::deque<TTsPacket, std::allocator<TTsPacket> >::pop_front() [11]
                0.03    0.30 8058004/8058004     std::deque<TTsPacket, std::allocator<TTsPacket> >::size() const [12]
                0.01    0.19  576183/576183      DvPlaybackBC::insertToPlaybackBuffer(TPlaybackBuffer const&) [22]
                0.04    0.11 4029401/4029401     std::deque<TTsPacket, std::allocator<TTsPacket> >::front() [25]

writeToIPOutput Hierarchy

[3]     36.8    0.92    1.00       1         DvIPPlaybackBC::writeToIPOutput() [3]
                0.31    0.00 1129444/1129444     TPlaybackBuffer::operator=(TPlaybackBuffer const&) [13]
                0.01    0.18  579235/1155128     std::deque<TPlaybackBuffer, std::allocator<TPlaybackBuffer> >::push_back(TPlaybackBuffer const&) [8]
                0.03    0.10 1135318/1135318     std::deque<TPlaybackBuffer, std::allocator<TPlaybackBuffer> >::pop_front() [27]

I guess writeToIPOutput spends too much time on assignment. I can work on that. But topupBuffer spends it's time in std::deque.

Is that the correct interpretation of the profile output?

If so, then is using a different container going to be more efficient and, if so, which one?

Thank you

EDIT I The explanatory notes at the end of the call tree says:

% time  This is the percentage of the `total' time that was spent
        in this function and its children.  Note that due to
        different viewpoints, functions excluded by options, etc,
        these numbers will NOT add up to 100%.

self    This is the total amount of time spent in this function.

children    This is the total amount of time propagated into this
        function by its children.

So looking at bufferizeTsPackets, 1.15 is spent in its children, of which, 0.30 + 0.30 + 0.11 = 0.71 is spent in different deque methods (push_back, size etc). Right? So 0.71 is over half of the total time (1.15) spent in in the children (??)

Upvotes: 3

Views: 3234

Answers (1)

Thomas Matthews
Thomas Matthews

Reputation: 57749

A more efficient structure would be to implement a circular queue (ring buffer) using an array.

Since arrays are fixed size, you either have to make the array large enough so there is no data overrun; or only store the last N values, where N is the capacity of the buffer.

Many embedded systems use arrays to reduce any issues of memory fragmentation caused by dynamic memory location.

If your array is small enough, it can fit in the processor's data cache; which speeds up computation.

Upvotes: 2

Related Questions