Reputation: 6240

Stream buffering issue

The mod_rewrite documentation states that it is a strict requirement to disable in(out)put buffering in a rewrite program.

Keeping that in mind I've written a simple program (I do know that it lacks the EOF check but this is not an issue and it saves one condition check per loop):

#include <stdio.h>
#include <stdlib.h>
int main ( void )
{
    setvbuf(stdin,NULL,_IONBF,0);
    setvbuf(stdout,NULL,_IONBF,0);
    int character;
    while ( 42 )
    {
        character = getchar();
        if ( character == '-' )
        {
            character = '_';
        }
        putchar(character);
    }
    return 0;
}

After making some measurements I was shocked - it was over 9,000 times slower than the demo Perl script provided by the documentation:

#!/usr/bin/perl
    $| = 1; # Turn off I/O buffering
    while (<STDIN>) {
        s/-/_/g; # Replace dashes with underscores
        print $_;
    }

Now I have two related questions:

Question 1. I believe that the streams may be line buffered since Apache sends a new line after each path. Am I correct? Switching my program to

setvbuf(stdin,NULL,_IOLBF,4200);

setvbuf(stdout,NULL,_IOLBF,4200);

makes it twice as fast as Perl one. This should not hit Apache's performance, should it?

Question 2. How can one write a program in C which will use unbuffered streams (like Perl one) and will perform as fast as Perl one?

Upvotes: 0

Answers (2)

Piotr Praszmo

Reputation: 18320

When writing to a terminal, stdout is flushed after every line. This way you can always see the output right away. When writing to a file or, as in your case a pipe, this automatic flush is disabled. Usually in those cases performance is more important.

This causes problems when processes have to interact with each other. One program writes something. It's not sent instantly but stored in a buffer. Second program waits for that data. First program waits for more data from second program resulting in a deadlock.

To avoid this, you need to flush all the output before waiting for additional input. Simple fflusuh(stdout) before every read operation should be enough. This is actually what $|=1 does in Perl. Nothing needs to be done with stdin.

If performance is critical and you need to operate only on single bytes. Read and write data in big chunks using unbuffered read/write. For example:

#include <unistd.h>

int main() {
    char buf[1024];
    while(1) {
        int len = read(0,buf,sizeof(buf));
        for(int i=0;i<len;i++) {
            if ( buf[i] == '-' ) {
                buf[i] = '_';
            }
        }
        write(1,buf,len);
    }
}

Upvotes: 0

DrC

Reputation: 7698

Question 1: You would have to look at the code. It could be line buffered, it could be using fflush at the end of each request (or block of requests), or it could be using write calls with a larger buffer. In any case, it won't be doing per-character I/O which is what your program is doing.

Question 2: I suspect the main issue is on output. If you were to assemble the entire result in a buffer and write that out as one call, then you would be faster. However, that just means you are doing the line buffering instead of having the library take care of it for you. The key is that with no buffering, each output call results in a system call - that is very high overhead. In theory, the same concept holds true on input but I'm not sure the implementation wouldn't notice the available characters and buffer them in any case. Same workaround though - read a larger buffer and then take it apart yourself.

Personally, I'd avoid all the setvbuf stuff and just do an fflush at the end of each request.

Upvotes: 2

Stream buffering issue

Answers (2)

Related Questions