Reputation: 1381
I have some html data stored in text files right now. I recently decided to store the HTML data in the pgsql database instead of flat files. Right now, the 'entries' table contains a 'path' column that points to the file. I have added a 'content' column that should now store the data in the file pointed to by 'path'. Once that is complete, the 'path' column will be deleted. The problem that I am having is that the files contain apostrophes that throw my script out of whack. What can I do to correct this issue??
Here is the script
#!/bin/sh
dbname="myDB"
username="username"
fileroot="/path/to/the/files/*"
for f in $fileroot
do
psql $dbname $username -c "
UPDATE entries
SET content='`cat $f`'
WHERE id=SELECT id FROM entries WHERE path LIKE '*`$f`';"
done
Note: The logic in the id=SELECT...FROM...WHERE path LIKE ""
is not the issue. I have tested this with sample filenames in the pgsql environment.
The problem is that when I cat $f
, any apostrophe in Edit: the contents of $f closes the SQL string, and I get a syntax error.
Upvotes: 1
Views: 8362
Reputation: 1381
In the original post, I made it sound like there were apostrophes in the filename represented by $f. This was NOT the case, so a simple echo "$f"
was able to fix my issue.
To make it more clear, the contents of my files were formatted as html snippets, typically something like <p>Blah blah <b>blah</b>...</p>
. After trying the solution posted by Craig, I realized I had used single quotes in some anchor tags, and I did NOT want to change those to something else. There were only a few files where this violation occurred, so I just changed these to double quotes by hand. I also realized that instead of escaping the apostrophes, it would be better to convert them to '
Here is the final script that I ended up using:
dbname="myDB"
username="username"
fileroot="/path/to/files/*"
for f in $fileroot
do
psql $dbname $username << __END__
UPDATE entries
SET content='$(sed "s/'/\'/g" < "$f")'
WHERE id=(SELECT e.id FROM entries e WHERE path LIKE '%$(echo "$f")');
__END__
done
The format coloring on here might make it look like the syntax is incorrect, but I have verified that it is correct as posted.
Upvotes: 0
Reputation: 324465
For the single quote escaping issue, a reasonable workaround might be to double the quotes, so you'd use:
`sed "s/'/''/g" < "$f"`
to include the file contents instead of the cat
, and for the second invocation in the LIKE
where you appeared to intend to use the file name use:
${f/"'"/"''"/}
to include the literal string content of $f
instead of executing it, and double the quotes. The ${varname/match/replace}
expression is bash
syntax and may not work in all shells; use:
`echo "$f" | sed "s/'/''/g"`
if you need to worry about other shells.
There are a bunch of other problems in that SQL.
$f
in your second invocation. I'm pretty sure you didn't intend that; I imagine you meant to include the literal string. (SELECT ...)
not just SELECT
. LIKE
expression is also probably not doing what you intended; you probably meant %
instead of *
, since %
is the SQL wildcard.If I also change backticks to $()
(because it's clearer and easier to read IMO), fix the subquery syntax and add an alias to disambiguate the columns, and use a here-document instead passed to psql
's stdin, the result is:
psql $dbname $username <<__END__
UPDATE entries
SET content=$(sed "s/'/''/g" < "$f")
WHERE id=(SELECT e.id FROM entries e WHERE e.path LIKE '$(echo "$f" | sed "s/'/''/g")');
__END__
The above assumes you're using a reasonably modern PostgreSQL with standard_conforming_strings = on
. If you aren't, change the regexp to escape apostrophes with \
instead of doubling them, and prefix the string with E
, so O'Brien
becomes E'O\'Brien'
. In modern PostgreSQL it'd instead become 'O''Brien'
.
In general, I'd recommend using a real scripting language like Perl with DBD::Pg or Python with psycopg to solve scripting problems with databases. Working with the shell is a bit funky. This expression would be much easier to write with a database interface that supported parameterised statements.
For example, I'd write this as follows:
import os
import sys
import psycopg2
try:
connstr = sys.argv[1]
filename = sys.argv[2]
except IndexError as ex:
print("Usage: %s connect_string filename" % sys.argv[0])
print("Eg: %s \"dbname=test user=fred\" \"some_file\"" % sys.argv[0])
sys.exit(1)
def load_file(connstr,filename):
conn = psycopg2.connect(connstr)
curs = conn.cursor()
curs.execute("""
UPDATE entries
SET content = %s
WHERE id = (SELECT e.id FROM entries e WHERE e.path LIKE '%%'||%s);
""", (filename, open(filename,"rb").read()))
curs.close()
if __name__ == '__main__':
load_file(connstr,filename)
Note the SQL wildcard %
is doubled to escape it, so it results in a single %
in the final SQL. That's because Python is using %
as its format-specifier so a literal %
must be doubled to escape it.
You can trivially modify the above script to accept a list of file names, connect to the database once, and loop over the list of all file names. That'll be a lot faster, especially if you do it all in one transaction. It's a real pain to do that with psql
scripting; you have to use bash co-process as shown here ... and it isn't worth the hassle.
Upvotes: 2