Maxime Laval
Maxime Laval

Reputation: 4378

Insert stuff at every nth occurence of a character in a file

I have a file containing queries, each query ends with a ";", I want to add a "COMMIT;BEGIN;" every 100 queries. Queries can take more than 1 line.

For instance:


    INSERT INTO table
    VALUES(...);
    DELETE FROM table WHERE ...;
    UPDATE table
    SET ...;

So I want to replace every 100th ";" with "COMMIT;BEGIN;" (I know I have to add a BEGIN; at the beginning of the file and do something at the end but it's easy)

I need to do that in a shell script but I am not a Linux expert, is it better to use sed or awk (also the file size can be huge like 4GB)? I know the basics for those commands but I don't know if I can do what I want here...

Thanks!

Upvotes: 2

Views: 993

Answers (4)

potong
potong

Reputation: 58381

This might work for you (GNU sed);

sed ':a;$!{N;ba};s/^\([^;]*\(;[^\n][^;]*\)*;\s*$\)\{100\}/&\nCOMMIT;BEGIN;/mg' file

Upvotes: 0

brandizzi
brandizzi

Reputation: 27050

I would recommend not to try to replace the semicolons. Instead, "parse" all the command and put BEGIN and COMMIT around the bunch of commands you want to commit. This is fairly easy with sed. If you have this file:

$ cat my.sql 
INSERT INTO table VALUES (1);
INSERT INTO table VALUES (2);
INSERT INTO table VALUES (3);
INSERT INTO table VALUES (4);
INSERT INTO table VALUES (5);
INSERT INTO table VALUES (6);
INSERT INTO table VALUES (7);
INSERT INTO table VALUES (8);
INSERT INTO table VALUES (9);
INSERT INTO table VALUES (10);

just run this command:

$ sed -n 'H;${x;s/\([^;]*;\)\{,3\}/BEGIN;&\nCOMMIT;\n\n/g;p}' my.sql 
BEGIN;
INSERT INTO table VALUES (1);
INSERT INTO table VALUES (2);
INSERT INTO table VALUES (3);
COMMIT;

BEGIN;
INSERT INTO table VALUES (4);
INSERT INTO table VALUES (5);
INSERT INTO table VALUES (6);
COMMIT;

BEGIN;
INSERT INTO table VALUES (7);
INSERT INTO table VALUES (8);
INSERT INTO table VALUES (9);
COMMIT;

BEGIN;
INSERT INTO table VALUES (10);
COMMIT;

(Here I am using 3 as the size of the block for clarity. For "embracing" blocks of 100 commands, replace \{,3\} with \{,100\})

What does it do?

  • First, we inhibit the printing of lines with -n. Now sed will only print lines when we order it to print them explicitly.

  • Now, for each line, we append the line to the hold space with H.

  • At the last line (address $) we execute a block of commands (beginning at { and ending at }). The first command, x, swaps the content of the hold space (which now contains all the file) and the pattern space.

  • After this, we replace with s/// n (0 < n <= 3) series of characters (none being ;) followed by ; with the string BEGIN;, the matched block of commands (represented by &) and the string \nCOMMIT;\n\n (with newlines for better readability.

  • At last, we print the content of the pattern space with p.

Upvotes: 1

Dennis Williamson
Dennis Williamson

Reputation: 359985

If the semicolons are at the end of the line:

awk '{print} /;$/ && ! (count++%100) {print "COMMIT; BEGIN;"}' inputfile

If it's important that precisely 100 semicolons are counted and they may be anywhere in a line it can be done but it gets a bit more complicated.

Upvotes: 0

William Pursell
William Pursell

Reputation: 212238

If you can guarantee that the ; is at the end of the line, or if you don't really care about lines with multiple ;'s, the easy solution (untested) is:

awk '/;/{ count+=1 } {print} count==100 { print "COMMIT; BEGIN"; count=0 }'

Upvotes: 1

Related Questions