Henry
Henry

Reputation: 71

Looking for help to send queries to Postgres from Perl

I am trying to write a Perl script to bring up a Postgres client once and send several files through the client, captures the output separately for each file.

If I do something like:

system ("cat $query1.sql | psql -p 2070 super &> $HOME/Results1.txt");
system ("cat $query2.sql | psql -p 2070 super &> $HOME/Results2.txt");

then Perl will start up the client for each query. As I'll be running hundreds and maybe thousands of queries I want to skip the overhead for starting up all but the first client.

I think I should be able to bring up the Postgres client via Open2, but it hangs when I try. I'm doing this on SUSE Linux machine with Perl 5.10.0.

Here's my code:

use IPC::Open2;

use IO::Handle;

our $pid = open2(*CHILDOUT, *CHILDINT, '../installdir/bin/psql -p 2070 super');
print STDOUT $pid;

print CHILDINT "cat $dumpR5KScript";
print STDOUT 'Sent the commands';

$output = <CHILDOUT>;

close(CHILDIN);
close(CHILDOUT);

It appears to be hanging with "open2" because I never see the pid.

Can someone point out what I am doing wrong so my call to open2 doesn't hang?

And if anyone has advice on the larger issue of the best way of bringing up a Postgres client and running queries through it I would be grateful.

Upvotes: 2

Views: 298

Answers (1)

January
January

Reputation: 17140

You have already been told to use DBI in the comments to your post, and it will be a good thing if you do. Formatting is much easier than fiddling with the IPC and stitching together a sort of an API between Perl and a command line database client, parsing the output and formatting the input.

Regarding your problem, however:

  1. it should be \*CHILDIN instead of *CHILDIN (reference to typeglob rather than typeglob)

  2. and anyways in such a case you should use variables instead of typeglobs and ancient idioms:

    my ( $childout, $childin ) ;
    our $pid = open2( $childout, $childin, '../installdir/bin/psql -p 2070 super');
    print STDOUT $pid;
    
  3. Please read the documentation for IPC::Open2.

  4. Also, better to use open3 to handle STDERR as well

  5. Finally, I do not know the postgress client, but the possibility of a deadlock (which you are experiencing) is very real with open2:

This whole affair is quite dangerous, as you may block forever. It assumes it's going to talk to something like bc, both writing to it and reading from it. This is presumably safe because you "know" that commands like bc will read a line at a time and output a line at a time. Programs like sort that read their entire input stream first, however, are quite apt to cause deadlock.

The big problem with this approach is that if you don't have control over source code being run in the child process, you can't control what it does with pipe buffering. Thus you can't just open a pipe to cat -v and continually read and write a line from it.

The IO::Pty and Expect modules from CPAN can help with this, as they provide a real tty (well, a pseudo-tty, actually), which gets you back to line buffering in the invoked command again.

Upvotes: 1

Related Questions