I am in the process of converting the schema of an existing database to Postgresql. I want to automate as much of this as possible, to avoid manual errors. The original database uses CLUSTERED indices, however PG does not (really) have clustered indices. I want to write a bash script to replace all occurences of CLUSTERED indices, to a postgresql equivalent. Essentially, I want to SUBSTITUTE lines like this: CREATE clustered INDEX idx_foobar ON foobar (f1, f2, f3, f4,f5); with a 2 line replacement like this: CREATE INDEX idx_foobar ON foobar (f1, f2, f3, f4,f5); CLUSTER foobar; I think I have worked out the matching logic, I just need help with the regex, as I am not very familiar. The matching logic that seems to work is as follows: Find a line that starts with CREATE clustered INDEX (the line may begin with one or more non-newline whitespaces) store the name of the table (it follows one or more whitespace after the ON keyword) Remove the word clustered from the line matched in 1 above to create substitute text Append "\nCLUSTER $tablename" to the substitute text in step 3 above Replace matched line in step1 with the substitute text (obtained in step 4) Could someone help me in incorporating this logic into a bash script so I can pass it the file to be processed? Incidentally, I thought I could possibly use sed to do this, but I don't know if it will be easier (i.e. easier to understand) to write a bash script, instead of attempting to do this as a one liner in sed - but I am open to suggestions.

Reputation: 68426

bash script to parse (and replace) all occurences of a block of text in file

I am in the process of converting the schema of an existing database to Postgresql. I want to automate as much of this as possible, to avoid manual errors.

The original database uses CLUSTERED indices, however PG does not (really) have clustered indices. I want to write a bash script to replace all occurences of CLUSTERED indices, to a postgresql equivalent.

Essentially, I want to SUBSTITUTE lines like this:

CREATE clustered INDEX idx_foobar ON foobar (f1, f2, f3, f4,f5);

with a 2 line replacement like this:

CREATE INDEX idx_foobar ON foobar (f1, f2, f3, f4,f5); CLUSTER foobar;

I think I have worked out the matching logic, I just need help with the regex, as I am not very familiar. The matching logic that seems to work is as follows:

Find a line that starts with CREATE clustered INDEX (the line may begin with one or more non-newline whitespaces)
store the name of the table (it follows one or more whitespace after the ON keyword)
Remove the word clustered from the line matched in 1 above to create substitute text
Append "\nCLUSTER $tablename" to the substitute text in step 3 above
Replace matched line in step1 with the substitute text (obtained in step 4)

Could someone help me in incorporating this logic into a bash script so I can pass it the file to be processed?

Incidentally, I thought I could possibly use sed to do this, but I don't know if it will be easier (i.e. easier to understand) to write a bash script, instead of attempting to do this as a one liner in sed - but I am open to suggestions.

Upvotes: 0

Answers (4)

NeronLeVelu

Reputation: 10039

sed --posix "/CREATE clustered INDEX/ {
   s/ *clustered */ /
   s/ON *\([^( ]*\) *(.*$/& CLUSTER \1;/
   }"

--posix to be available for non GNU also I make another regex than bob Schuster (very good one) just to have an alternative that allow more modification on the line if needed for other purpose like inserting comment in script.

here is the session on a cygwin bash (version oneline)

$ cat sample.txt
CREATE clustered INDEX idx_foobar ON foobar (f1, f2, f3, f4,f5);
blabla;

$ sed --posix "/CREATE clustered INDEX/ {s/ *clustered */ /;s/ON *\([^( ]*\) *(.*$/& CLUSTER \1;/;}" sample.txt
CREATE INDEX idx_foobar ON foobar (f1, f2, f3, f4,f5); CLUSTER foobar;
blabla;

Upvotes: 1

Denis de Bernardy

Reputation: 78463

Be wary that clustering in Postgres isn't necessarily the same as in the original database you're using (I presume SQL Server?). Per the docs:

Clustering is a one-time operation: when the table is subsequently updated, the changes are not clustered. That is, no attempt is made to store new or updated rows according to their index order. (If one wishes, one can periodically recluster by issuing the command again. (...))

http://www.postgresql.org/docs/current/static/sql-cluster.html

This means that replacing create clustered index on table (...); with create index on table (...); cluster table; isn't going to work the way you're expecting.

In light of that, stick to removing clustered using sed, or make sure you add the additional using index part. If the latter, you'll also want to add an extra cluster table at the very end of the import, to actually cluster the data.

Methinks you ought to remove the clustered references altogether, and worry about adding them at the very end of your import, either manually or by generating an additional SQL file as part of or prior to the removal script.

Upvotes: 0

potong

Reputation: 58430

This might work for you (GNU sed):

sed -r 's/^(\s*CREATE) (cluster)ed(.* (\S+) \(.*\);)\s*$/\1\3\n\U\2 \L\4;/' file

Upvotes: 0

Bob Schuster

Reputation: 21

You could try sed, for example:

sed -r 's/^\s*(CREATE\s*)clustered\s*(INDEX.*ON\s*)(\w*)(\s+\(.*;)$/\1\2\3\4\nCLUSTER \3;/gi' original.txt > updated.txt

I followed your guidelines which is why the regex is a bit bulky, but you can revise the regex based on the actual content of your input file and whether you want to preserve extraneous spaces.

One good place to experiment with regex is: http://regex101.com

Upvotes: 0

bash script to parse (and replace) all occurences of a block of text in file

Answers (4)

Related Questions