Reputation: 68426
I am in the process of converting the schema of an existing database to Postgresql. I want to automate as much of this as possible, to avoid manual errors.
The original database uses CLUSTERED indices, however PG does not (really) have clustered indices. I want to write a bash script to replace all occurences of CLUSTERED indices, to a postgresql equivalent.
Essentially, I want to SUBSTITUTE lines like this:
CREATE clustered INDEX idx_foobar ON foobar (f1, f2, f3, f4,f5);
with a 2 line replacement like this:
CREATE INDEX idx_foobar ON foobar (f1, f2, f3, f4,f5);
CLUSTER foobar;
I think I have worked out the matching logic, I just need help with the regex, as I am not very familiar. The matching logic that seems to work is as follows:
Could someone help me in incorporating this logic into a bash script so I can pass it the file to be processed?
Incidentally, I thought I could possibly use sed
to do this, but I don't know if it will be easier (i.e. easier to understand) to write a bash script, instead of attempting to do this as a one liner in sed - but I am open to suggestions.
Upvotes: 0
Views: 294
Reputation: 10039
sed --posix "/CREATE clustered INDEX/ {
s/ *clustered */ /
s/ON *\([^( ]*\) *(.*$/& CLUSTER \1;/
}"
--posix
to be available for non GNU also
I make another regex than bob Schuster (very good one) just to have an alternative that allow more modification on the line if needed for other purpose like inserting comment in script.
here is the session on a cygwin bash (version oneline)
$ cat sample.txt
CREATE clustered INDEX idx_foobar ON foobar (f1, f2, f3, f4,f5);
blabla;
$ sed --posix "/CREATE clustered INDEX/ {s/ *clustered */ /;s/ON *\([^( ]*\) *(.*$/& CLUSTER \1;/;}" sample.txt
CREATE INDEX idx_foobar ON foobar (f1, f2, f3, f4,f5); CLUSTER foobar;
blabla;
Upvotes: 1
Reputation: 78463
Be wary that clustering in Postgres isn't necessarily the same as in the original database you're using (I presume SQL Server?). Per the docs:
Clustering is a one-time operation: when the table is subsequently updated, the changes are not clustered. That is, no attempt is made to store new or updated rows according to their index order. (If one wishes, one can periodically recluster by issuing the command again. (...))
http://www.postgresql.org/docs/current/static/sql-cluster.html
This means that replacing create clustered index on table (...);
with create index on table (...); cluster table;
isn't going to work the way you're expecting.
In light of that, stick to removing clustered
using sed, or make sure you add the additional using index
part. If the latter, you'll also want to add an extra cluster table
at the very end of the import, to actually cluster the data.
Methinks you ought to remove the clustered references altogether, and worry about adding them at the very end of your import, either manually or by generating an additional SQL file as part of or prior to the removal script.
Upvotes: 0
Reputation: 58430
This might work for you (GNU sed):
sed -r 's/^(\s*CREATE) (cluster)ed(.* (\S+) \(.*\);)\s*$/\1\3\n\U\2 \L\4;/' file
Upvotes: 0
Reputation: 21
You could try sed, for example:
sed -r 's/^\s*(CREATE\s*)clustered\s*(INDEX.*ON\s*)(\w*)(\s+\(.*;)$/\1\2\3\4\nCLUSTER \3;/gi' original.txt > updated.txt
I followed your guidelines which is why the regex is a bit bulky, but you can revise the regex based on the actual content of your input file and whether you want to preserve extraneous spaces.
One good place to experiment with regex is: http://regex101.com
Upvotes: 0