markus_p
markus_p

Reputation: 574

Fork Process / Read Write through pipe SLOW

ANSWER

https://stackoverflow.com/a/12507520/962890

it was so trivial.. args! but lots of good information received. thanks to everyone.

EDIT

link to github: https://github.com/MarkusPfundstein/stream_lame_testing

ORIGINAL POST

i have some questions regarding IPC through pipelines. My goal is to receive MP3 data per TCP/IP stream, pipe it through LAME to decode it to wav, do some math and store it on disk (as a wav). I am using non blocking IO for the whole thing. What irritates me a bit is that the tcp/ip read is way more fast than the pipe line trough lame. When i send a ~3 MB mp3 the file gets read on the client side in a couple of seconds. In the beginning, i can also write to the stdin of the lame process, than it stops writing, it reads the rest of the mp3 and if its finished i can write to lame again. 4096 bytes take approx 1 second (to write and read from lame). This is pretty slow, because i want to decode my wav min 128kbs.

The OS Is a debian 2.6 kernel on a this micro computer:

https://www.olimex.com/dev/imx233-olinuxino-maxi.html

65 MB RAM 400 MhZ

ulimit -n | grep pipe returns 512 x 8 , means 4096 which is ok. Its a 32 bit system.

The weird thing is that

my_process | lame --decode --mp3input - output.wav

goes very fast.

Here is my fork_lame code (which shall essentialy connect stout of my process to stdin of lame and visa versa)

static char * const k_lame_args[] = {
  "--decode",
  "--mp3input",
  "-",
  "-",
  NULL
};

static int
fork_lame()
{
  int outfd[2];
  int infd[2];
  int npid;
  pipe(outfd); /* Where the parent is going to write to */
  pipe(infd); /* From where parent is going to read */
  npid = fork();
  if (npid == 0) {
    close(STDOUT_FILENO);
    close(STDIN_FILENO);
    dup2(outfd[0], STDIN_FILENO);
    dup2(infd[1], STDOUT_FILENO);
    close(outfd[0]); /* Not required for the child */
    close(outfd[1]);
    close(infd[0]);
    close(infd[1]);
    if (execv("/usr/local/bin/lame", k_lame_args) == -1) {
      perror("execv");
      return 1;
    }
  } else {
    s_lame_pid = npid;
    close(outfd[0]); /* These are being used by the child */
    close(infd[1]);
    s_lame_fds[WRITE] = outfd[1];
    s_lame_fds[READ] = infd[0];
  }
  return 0;
}

This are the read and write functions. Please not that in write_lame_in. when i write to stderr instead of s_lame_fds[WRITE], the output is nearly immedieatly so its definitly the pipe through lame. But why ?

static int
read_lame_out() 
{
  char buffer[READ_SIZE];
  memset(buffer, 0, sizeof(buffer));
  int i;
  int br = read(s_lame_fds[READ], buffer, sizeof(buffer) - 1);
  fprintf(stderr, "read %d bytes from lame out\n", br);
  return br;
}

static int
write_lame_in()
{
  int bytes_written;
  //bytes_written = write(2, s_data_buf, s_data_len);
  bytes_written = write(s_lame_fds[WRITE], s_data_buf, s_data_len);
  if (bytes_written > 0) {
    //fprintf(stderr, "%d bytes written\n", bytes_written);
    s_data_len -= bytes_written;
    fprintf(stderr, "data_len write: %d\n", s_data_len);
    memmove(s_data_buf, s_data_buf + bytes_written, s_data_len);
    if (s_data_len == 0) {
      fprintf(stderr, "finished\n");
    }
  } 

  return bytes_written;
}

static int
read_tcp_socket(struct connection_s *connection)
{
  char buffer[READ_SIZE];
  int bytes_read;
  bytes_read = connection_read(connection, buffer, sizeof(buffer)-1);
  if (bytes_read > 0) {
    //fprintf(stderr, "read %d bytes\n", bytes_read);
    if (s_data_len + bytes_read > sizeof(s_data_buf)) {
      fprintf(stderr, "BUFFER OVERFLOW\n");
      return -1;
    } else {
      memcpy(s_data_buf + s_data_len,
             buffer,
             bytes_read);
      s_data_len += bytes_read;
    }
    fprintf(stderr, "data_len: %d\n", s_data_len);
  }
  return bytes_read;
}

The select stuff is pretty basic select logic. All blocks are non blocking of course.

Anyone any idea? I'd really appreciate any help ;-)

Upvotes: 2

Views: 2479

Answers (3)

Nominal Animal
Nominal Animal

Reputation: 39406

Oops! Did you check your LAME output?

Looking at your code, in particular

static char * const k_lame_args[] = {
  "--decode",
  "--mp3input",
  "-",
  "-",
  NULL
};

and

if (execv("/usr/local/bin/lame", k_lame_args) == -1) {

means you are accidentally omitting the --decode flag as it will be argv[0] for LAME, instead of the first argument (argv[1]). You should use

static char * const k_lame_args[] = {
  /* argv[0] */  "lame",
  /* argv[1] */  "--decode",
  /* argv[2] */  "--mp3input",
  /* argv[3] */  "-",
  /* argv[4] */  "-",
                 NULL
};

instead.

I think you are seeing a slowdown because you're accidentally recompressing the MP3 audio. (I noticed this just a minute ago, so haven't checked if LAME does that if you omit the --decode flag, but I believe it does.)

Upvotes: 3

Nominal Animal
Nominal Animal

Reputation: 39406

It is possible there is some sort of a blocking issue wrt. nonblocking pipes (not really being nonblocking), causing your end to block until LAME consumes the data.

Could you try an alternative approach? Use normal, blocking pipes, and a separate thread (using pthreads), which has the singular purpose of writing data from a circular buffer to LAME. Your main thread then keeps filling the circular buffer from your TCP/IP connection, and can easily also track and report buffer levels -- very useful during development and debugging. I've had much better success with blocking pipes and threads than nonblocking pipes, in general.

In Linux, threads really do not have that much of an overhead, so you should be comfortable in using them even on embedded architectures. The only trick you must master is specifying a sensible stack size for the worker thread -- in this case 16384 bytes is quite likely enough -- because only the initial stack given to the process will automatically grow and threads stacks are fixed an by default quite large.

Do you need example code?

Edited to add:

Your program receives data from the TCP/IP connection probably at a steady rate. However, LAME consumes the data in largeish chunks. In other words, the situation is like a car being towed, with the tow car jerking and stopping, with the towee jerking into it every time: both your process and LAME are most of the time waiting the other to receive/send more data.

Upvotes: 2

Julien Fouilhé
Julien Fouilhé

Reputation: 2658

First, those two close are not required (actually, you shouldn't do that), because the two dup2 which follow will do it automatically :

close(STDOUT_FILENO);
close(STDIN_FILENO);

Upvotes: 1

Related Questions