Reputation: 2683
I need to optimize some legacy code and am fairly new to C++.
The code does network packet processing in two threads, one thread pushing packets to a FIFO [topupBuffer
] and the other thread reading from the queue and sending them out an IP socket [writeToIPOutput
]. The legacy code uses std::deque to implement the FIFO.
However, running the program uses a lot of CPU, up to 50% (where it needs to be more like 5%). Running gprof seems to reveal std::deque
is the culprit. (I am not sure I am interpreting the profile results correctly, so help is appreciated)
Excepts from the profile output: topupBuffer hierarchy:
index % time self children called name
0.65 2.51 1/1 DvIPFilePlayback::topupBufferThreadMethod(void*) [2]
[1] 60.5 0.65 2.51 1 DvIPFilePlayback::topupBuffer() [1]
0.27 1.15 4025575/4025575 DvIPPlaybackBC::bufferizeTsPackets(TPlaybackBuffer&, int&, int&) [5]
0.03 0.56 4026668/4026668 std::deque<TTsPacket, std::allocator<TTsPacket> >::push_back(TTsPacket const&) [6]
0.03 0.15 4046539/5749754 std::deque<TPlaybackBuffer, std::allocator<TPlaybackBuffer> >::size() const [17]
and
[5] 27.2 0.27 1.15 4025575 DvIPPlaybackBC::bufferizeTsPackets(TPlaybackBuffer&, int&, int&) [5]
0.04 0.30 4031674/4031674 std::deque<TTsPacket, std::allocator<TTsPacket> >::pop_front() [11]
0.03 0.30 8058004/8058004 std::deque<TTsPacket, std::allocator<TTsPacket> >::size() const [12]
0.01 0.19 576183/576183 DvPlaybackBC::insertToPlaybackBuffer(TPlaybackBuffer const&) [22]
0.04 0.11 4029401/4029401 std::deque<TTsPacket, std::allocator<TTsPacket> >::front() [25]
writeToIPOutput Hierarchy
[3] 36.8 0.92 1.00 1 DvIPPlaybackBC::writeToIPOutput() [3]
0.31 0.00 1129444/1129444 TPlaybackBuffer::operator=(TPlaybackBuffer const&) [13]
0.01 0.18 579235/1155128 std::deque<TPlaybackBuffer, std::allocator<TPlaybackBuffer> >::push_back(TPlaybackBuffer const&) [8]
0.03 0.10 1135318/1135318 std::deque<TPlaybackBuffer, std::allocator<TPlaybackBuffer> >::pop_front() [27]
I guess writeToIPOutput
spends too much time on assignment. I can work on that. But topupBuffer
spends it's time in std::deque
.
Is that the correct interpretation of the profile output?
If so, then is using a different container going to be more efficient and, if so, which one?
Thank you
EDIT I The explanatory notes at the end of the call tree says:
% time This is the percentage of the `total' time that was spent
in this function and its children. Note that due to
different viewpoints, functions excluded by options, etc,
these numbers will NOT add up to 100%.
self This is the total amount of time spent in this function.
children This is the total amount of time propagated into this
function by its children.
So looking at bufferizeTsPackets
, 1.15 is spent in its children, of which, 0.30 + 0.30 + 0.11 = 0.71 is spent in different deque methods (push_back, size etc). Right? So 0.71 is over half of the total time (1.15) spent in in the children (??)
Upvotes: 3
Views: 3234
Reputation: 57749
A more efficient structure would be to implement a circular queue (ring buffer) using an array.
Since arrays are fixed size, you either have to make the array large enough so there is no data overrun; or only store the last N values, where N is the capacity of the buffer.
Many embedded systems use arrays to reduce any issues of memory fragmentation caused by dynamic memory location.
If your array is small enough, it can fit in the processor's data cache; which speeds up computation.
Upvotes: 2