Efficiently pipe in-memory file contents to a command line command

Question

I have a ruby script that, at one point, has an in-memory file that may or may not be backed by an entry in the filesystem. The naive solution would be to create a tempfile if a filesystem entry does not exist, but this will result in the command line command re-reading the file into memory.

Ideally, I would like to avoid reading the file into memory more than once since it could potentially be quite large.

Now, the command line command does accept a piped input, so I thought this might be a good solution, but I cannot find any way to achieve piping a Ruby File object's contents into something happening on the command line.

I'm also open to other recommendations if I'm coming at this from the wrong direction. The files not backed by a filesystem entry are being read from a remote HTTP stream.

matt · Accepted Answer

One way would be to read the IOs contents into a string, and then use something like Kernel#open (with a |), IO::popen or open3 to create the subprocess and write the contents to the subprocesses stdin:

f = the_file_or_io_object
data = f.read

IO::popen('the_command', 'r+') do |io|
  io.write data
  io.close_write
  puts io.read
end

Although this avoids going writing the file to disk (unless it already is with e.g. a tempfile) it involves reading the file contents into memory and then passing them to the subprocess, so they are in memory twice. If you want to avoid that you could use fork (if your system has it) and reopen:

# f as before, no need to read it in this time

pid = fork do
  $stdin.reopen f
  # Now stdin is the file, so when the command is run it will see 
  # it on its stdin
  exec 'the_command'
end

Process.wait pid

If you’re on Windows you probably won’t have fork, so you could try spawn, redirecting stdin:

pid = spawn 'the_command', :in => f

Process.wait pid

Efficiently pipe in-memory file contents to a command line command

Answers (1)

Related Questions