Max
Max

Reputation: 2859

Why is this benchmark code using such high CPU?

The code below works, it's sending all the correct data, and it's receiving the correct data.

When I use it to benchmark a very fast server, the benchmark's CPU usage is ~10%. However, when I benchmark a slow server, that rises to ~50% – the same as the server I'm benchmarking/stress testing*.

That is going by what top's reporting.

Why would it use so much CPU? I suspect I'm misusing poll, but I'm not sure how?

The CPU time for the slow server is 4x that of the benchmark, while for the fast server it is 7x that of the benchmark.

int flags = fcntl(sockfd, F_GETFL, 0);
assert(flags != -1);
assert(fcntl(sockfd, F_SETFL, flags | O_NONBLOCK) != -1);

int32 red = 0;
struct pollfd pollfd = {
    .fd = sockfd,
    .events = POLLIN | POLLOUT
};
do {
    assert(poll(&pollfd, 1, -1) == 1);
    if (pollfd.revents & POLLOUT) {
        int n;
        while ((n = send(sockfd, buf__+bufOffset, bufLength-bufOffset, MSG_NOSIGNAL)) > 0) {
            bufOffset += n;
            if (n != bufLength-bufOffset)
                break;
        }
        assert(!(n == -1 && errno != EAGAIN && errno != EWOULDBLOCK));
    }

    if (pollfd.revents & POLLIN) {
        int r;
        while ((r = read(sockfd, recvBuf, MIN(recvLength-red, recvBufLength))) > 0) {
            // assert(memcmp(recvBuf, recvExpectedBuf+red, r) == 0);
            red += r;
            if (r != MIN(recvLength-red, recvBufLength))
                break;
        }
        assert(!(r == -1 && errno != EAGAIN && errno != EWOULDBLOCK));
    }
} while (bufOffset < bufLength);

assert(fcntl(sockfd, F_SETFL, flags & ~O_NONBLOCK) != -1);
int r;
while ((r = read(sockfd, recvBuf, MIN(recvLength-red, recvBufLength))) > 0) {
    // assert(memcmp(recvBuf, recvExpectedBuf+red, r) == 0);
    red += r;
}
assert(fcntl(sockfd, F_SETFL, flags | O_NONBLOCK) != -1);

assert(red == recvLength);

int r = read(sockfd, recvBuf, 1);
assert((r == -1 && (errno == EAGAIN || errno == EWOULDBLOCK)) || r == 0);

* (I'm running both benchmark and server on the same machine, for now. Communication is over TCP.)

Upvotes: 0

Views: 212

Answers (3)

Max
Max

Reputation: 2859

Problem solved.

It wasn't misrepresented CPU usage exactly. The inefficient server was sending 8 byte packages with TCP_NODELAY, so I was receiving millions of poll notifications to read just 8 bytes. It turns out the read(2) call was rather expensive, and calling it tens of thousands of times per second was enough to see "time spent in system mode" rocket to ~56%, which was added to "time spent in user mode" to produce the very high CPU usage.

Upvotes: 0

user2404501
user2404501

Reputation:

So if I finally understood this, you're comparing the ratio of %CPU reported by top to the ratio of the rate of increase of TIME+ reported by top, and they don't agree. (It would have been easier if you said which columns you were reading from!) As far as I can tell both are calculated from the same fields in the underlying /proc data, so it shouldn't be possible for them to disagree by much.

And I can't replicate it. I've put your code into a test program and run it with no modifications other than fixing a redeclaration of int r compilation error and adding what I believe to be reasonable declarations for all the stuff you left out. I connected it to a server that reads lines from the client and eats a little bit of CPU after each one before sending a line back. The result was that top showed %CPU around 99 for the server and 2 for the client and about a 50-to-1 ratio in the TIME+ column.

I find nothing wrong with the use of poll.

I don't like your use of assert though - when assertions are turned off the program is going to be missing a lot of important syscalls.

Upvotes: 0

abligh
abligh

Reputation: 25129

The reason is that you are busy-waiting. If the read and write return EAGAIN or EWOULDBLOCK you are calling them continuously. Add a select which will wait until the socket is ready for reading or writing before that.

Upvotes: 1

Related Questions