Benjamin Pierce
Benjamin Pierce

Reputation: 517

How to capture subprocess output on both stderr and stdout in OCaml

I'm looking at the following OCaml code, which attempts to open a subprocess, feed it some input, and then collect all of the output it produces on either stdout or stderr. But the fact that it first reads everything from stdout and then reads everything from stderr seems fishy -- if the subprocess happens to write a lot of stuff to stderr, it seems like the result will be a deadlock.

let rec output_lines (output : string list) (chan : out_channel) : unit =
  ignore (List.map (output_string chan) output); flush chan

let async_command
    (name : string)
    (arguments : string list)
  : (in_channel * out_channel * in_channel) =
  Unix.open_process_full
    (name ^ " " ^ (String.concat " " arguments))
    (Unix.environment ())

let sync_command
    (name : string)
    (arguments : string list)
    (input : string list) : (string list * string list) =
  let (o, i, e) = async_command name arguments in
  output_lines input i;
  let out, err = (input_lines o, input_lines e) in
  let status = Unix.close_process_full (o, i, e) in
  begin match status with
    | Unix.WEXITED   x -> if x != 0 then raise (Shell_error (unlines err))
    | Unix.WSIGNALED x -> if x = Sys.sigint then raise Sys.Break
    | Unix.WSTOPPED  x -> if x = Sys.sigint then raise Sys.Break
  end;
  (out, err)

How should this be fixed? (Better yet, what library should I use where this functionality is already implemented?)

Upvotes: 1

Views: 1554

Answers (1)

ivg
ivg

Reputation: 35210

In general, you should use some sort of polling over a set of descriptors along with the non-blocking input to pipe information from the to the output. This will, of course, garbage the output, i.e., intermix it in an arbitrary order. The polling can be performed with the Unix.select function. A non-blocking io can be enabled by setting the O_NONBLOCK flag when a file is opened, or with the Unix.set_nonblock function.

With all these said, I would like to stress that writing non-blocking code is not a trivial exercise. Especially if it is done in raw, poll/read/write loop. Many modern languages/runtimes have libraries that decouple this cycle and provide a generic interface in the form of callbacks. OCaml was one of the pioneers in this direction, with its Lwt library. We also have the Async library, that is slightly bigger but shares the same design. I wouldn't advise one or another, as it is opinion based, but, as an anecdote, your namesake Benjamin Pierce was using Lwt for his Unison project :)

A complete example without Lwt (Async)

We can burrow an example from the Unix System Programming book

(* The multiplex function takes a descriptor open on the serial 
   port and two arrays of descriptors of the same size, one containing
   pipes connected to the standard input of the user processes, the other
   containing pipes connected to their standard output. *)
open Unix;;

let rec really_read fd buff start length =
  if length <= 0 then () else
    match read fd buff start length with
    | 0 -> raise End_of_file
    | n -> really_read fd buff (start+n) (length-n);;

let buffer = String.create 258;;

let multiplex channel inputs outputs =
  let input_fds = channel :: Array.to_list inputs in
  try
    while true do
      let (ready_fds, _, _) = select input_fds [] [] (-1.0) in
      for i = 0 to Array.length inputs - 1 do
        if List.mem inputs.(i) ready_fds then begin
          let n = read inputs.(i) buffer 2 255 in
          buffer.[0] <- char_of_int i;
          buffer.[1] <- char_of_int n;
          ignore (write channel buffer 0 (n+2));
          ()
        end
      done;
      if List.mem channel ready_fds then begin
        really_read channel buffer 0 2;
        let i = int_of_char(buffer.[0])
        and n = int_of_char(buffer.[1]) in
        if n = 0 then close outputs.(i) else
        begin
          really_read channel buffer 0 n;
          ignore (write outputs.(i) buffer 0 n);
          ()
        end
      end
    done
  with End_of_file -> () ;;

Note: You should use create_subprocess that returns the file descriptors, to implement non blocking IO.

Upvotes: 2

Related Questions