bash script to update postgres database

Question

I have some html data stored in text files right now. I recently decided to store the HTML data in the pgsql database instead of flat files. Right now, the 'entries' table contains a 'path' column that points to the file. I have added a 'content' column that should now store the data in the file pointed to by 'path'. Once that is complete, the 'path' column will be deleted. The problem that I am having is that the files contain apostrophes that throw my script out of whack. What can I do to correct this issue??

Here is the script

#!/bin/sh
dbname="myDB"
username="username"
fileroot="/path/to/the/files/*"

for f in $fileroot
do
psql $dbname $username -c "
  UPDATE entries
  SET content='`cat $f`'
  WHERE id=SELECT id FROM entries WHERE path LIKE '*`$f`';"
done

Note: The logic in the id=SELECT...FROM...WHERE path LIKE "" is not the issue. I have tested this with sample filenames in the pgsql environment.

The problem is that when I cat $f, any apostrophe in Edit: the contents of $f closes the SQL string, and I get a syntax error.

Craig Ringer · Accepted Answer

For the single quote escaping issue, a reasonable workaround might be to double the quotes, so you'd use:

`sed "s/'/''/g" < "$f"`

to include the file contents instead of the cat, and for the second invocation in the LIKE where you appeared to intend to use the file name use:

${f/"'"/"''"/}

to include the literal string content of $f instead of executing it, and double the quotes. The ${varname/match/replace} expression is bash syntax and may not work in all shells; use:

`echo "$f" | sed "s/'/''/g"`

if you need to worry about other shells.

There are a bunch of other problems in that SQL.

You're trying to execute $f in your second invocation. I'm pretty sure you didn't intend that; I imagine you meant to include the literal string.
Your subquery is also wrong, it lacks parentheses; (SELECT ...) not just SELECT.
Your LIKE expression is also probably not doing what you intended; you probably meant % instead of *, since % is the SQL wildcard.

If I also change backticks to $() (because it's clearer and easier to read IMO), fix the subquery syntax and add an alias to disambiguate the columns, and use a here-document instead passed to psql's stdin, the result is:

psql $dbname $username <<__END__
  UPDATE entries
  SET content=$(sed "s/'/''/g" < "$f")
  WHERE id=(SELECT e.id FROM entries e WHERE e.path LIKE '$(echo "$f" | sed "s/'/''/g")');
__END__

The above assumes you're using a reasonably modern PostgreSQL with standard_conforming_strings = on. If you aren't, change the regexp to escape apostrophes with \ instead of doubling them, and prefix the string with E, so O'Brien becomes E'O\'Brien'. In modern PostgreSQL it'd instead become 'O''Brien'.

In general, I'd recommend using a real scripting language like Perl with DBD::Pg or Python with psycopg to solve scripting problems with databases. Working with the shell is a bit funky. This expression would be much easier to write with a database interface that supported parameterised statements.

For example, I'd write this as follows:

import os
import sys
import psycopg2

try:
        connstr = sys.argv[1]
        filename = sys.argv[2]
except IndexError as ex:
        print("Usage: %s connect_string filename" % sys.argv[0])
        print("Eg: %s \"dbname=test user=fred\" \"some_file\"" % sys.argv[0])
        sys.exit(1)


def load_file(connstr,filename):
        conn = psycopg2.connect(connstr)
        curs = conn.cursor()
        curs.execute("""
        UPDATE entries
        SET content = %s
        WHERE id = (SELECT e.id FROM entries e WHERE e.path LIKE '%%'||%s);
        """, (filename, open(filename,"rb").read()))
        curs.close()

if __name__ == '__main__':
        load_file(connstr,filename)

Note the SQL wildcard % is doubled to escape it, so it results in a single % in the final SQL. That's because Python is using % as its format-specifier so a literal % must be doubled to escape it.

You can trivially modify the above script to accept a list of file names, connect to the database once, and loop over the list of all file names. That'll be a lot faster, especially if you do it all in one transaction. It's a real pain to do that with psql scripting; you have to use bash co-process as shown here ... and it isn't worth the hassle.

bash script to update postgres database

Answers (2)

Related Questions