Reputation: 605

Using bash, awk or sed to templatize a CSV file into a SQL file

I will have a CSV file (say, ids.csv) that I need to ETL into a SQL script (say, update_products.sql). The CSV will be headerless and will consist of comma-delimited numbers (product IDs in a database), for instance:

29294848,29294849,29294850,29294851,29294853,29294857,29294858,29294860,29294861,29294863,29294887,29294888,
29294889,29294890,29294891,29294892,29294895,29294897,29294898,29294899,29294901,29294903,29294912,29294916

Starting with a SQL "template" file (template.sql) that looks something like this:

UPDATE products SET quantity = 0 WHERE id = %ID%;

I'm looking for a way via bash, awk, sed (or any other type of shell scripting tool), to templatize %IDS% with the values in the CSV, hence turning the generated SQL into something like:

UPDATE products SET quantity = 0 WHERE id = 29294848;
UPDATE products SET quantity = 0 WHERE id = 29294849;
UPDATE products SET quantity = 0 WHERE id = 29294850;
... etc, for all the IDs in the CSV...

Super flexible here:

Don't care which tool gets the job done (awk, sed, bash, whatever...as long as I can run it from the command line)
Don't necessarily NEED a template file (template.sql) to start with, perhaps the solution can just "inject" this template into the script as an argument
Ideally it would read the input CSV but this is not a hard requirement, if the solution requires me pasting the contents of the CSV file into the script as an argument, I'm OK with that, but not thrilled...
Ideally it would generate an actual SQL file (update_products.sql) for me, but if we're limited to console output thats OK to (just not preferred)

Any ideas how I might be able to accomplish this?

Upvotes: 1

Answers (4)

David C. Rankin

Reputation: 84561

In addition to the answer by @suspectus which provides a nice use of printf to output each line wanted, a slightly more procedural use of awk incorporating a for loop over the fields would be:

awk -F, '{
    for (i=1;i<=NF;i++)
        print "UPDATE products SET quantity = 0 WHERE id = " $i ";"
}' file.csv

Where the single rule simply loops over each of the comma-separated fields using string-concatenation to form the desired output. In detail the awk command:

awk -F, sets the field-separator (FS) equal to a comma to split the input,
for (i=1;i<=NF;i++) simply loops over each field, and
print "UPDATE products SET quantity = 0 WHERE id = " $i ";" outputs the wanted text incorporating the field within using string-concatenation.

Example Use/Output

With your data in file.csv (presumed to be a single line, but it really doesn't matter) your output would be:

$ awk -F, '{
>     for (i=1;i<=NF;i++)
>         print "UPDATE products SET quantity = 0 WHERE id = " $i ";"
> }' file.csv
UPDATE products SET quantity = 0 WHERE id = 29294848;
UPDATE products SET quantity = 0 WHERE id = 29294849;
UPDATE products SET quantity = 0 WHERE id = 29294850;
UPDATE products SET quantity = 0 WHERE id = 29294851;
UPDATE products SET quantity = 0 WHERE id = 29294853;
UPDATE products SET quantity = 0 WHERE id = 29294857;
UPDATE products SET quantity = 0 WHERE id = 29294858;
UPDATE products SET quantity = 0 WHERE id = 29294860;
UPDATE products SET quantity = 0 WHERE id = 29294861;
UPDATE products SET quantity = 0 WHERE id = 29294863;
UPDATE products SET quantity = 0 WHERE id = 29294887;
UPDATE products SET quantity = 0 WHERE id = 29294888;
UPDATE products SET quantity = 0 WHERE id = 29294889;
UPDATE products SET quantity = 0 WHERE id = 29294890;
UPDATE products SET quantity = 0 WHERE id = 29294891;
UPDATE products SET quantity = 0 WHERE id = 29294892;
UPDATE products SET quantity = 0 WHERE id = 29294895;
UPDATE products SET quantity = 0 WHERE id = 29294897;
UPDATE products SET quantity = 0 WHERE id = 29294898;
UPDATE products SET quantity = 0 WHERE id = 29294899;
UPDATE products SET quantity = 0 WHERE id = 29294901;
UPDATE products SET quantity = 0 WHERE id = 29294903;
UPDATE products SET quantity = 0 WHERE id = 29294912;
UPDATE products SET quantity = 0 WHERE id = 29294916;

Look things over and let me know if you have further questions.

Upvotes: 0

Léa Gris

Reputation: 19555

I propose to be safe rather than sorry.

May be deemed pedantic, but working with business database is serious matter.

So here it is based on @Paul Hodges's answer

#!/usr/bin/env bash

{
  # Use the prepared statements `zeroproduct` 
  # to protect against SQL injections
  printf 'PREPARE zeroproduct FROM '\''%s'\'';\n' \
    'UPDATE products SET quantity = 0 WHERE id = ?'

  # Work inside a transaction, so if something goes wrong,
  # like the sql file is incomplete, it can be rolled-back.
  printf 'START TRANSACTION;\n'

  while IFS=, read -r -a ids; do
    for id in "${ids[@]}"; do
      # Set the value of the @id argument in SQL
      # And execute the SQL statement with the @id argument
      # that will replace the '?'
      printf 'SET @id='\''%8d'\''; EXECUTE zeroproduct USING @id;\n' \
        "$((id))" # Ensure id is an integer
    done
  done <ids.csv

  # Now commit all these changes since we are finally here
  printf 'COMMIT;\n'

  # Deallocate the prepared statement once we are done
  printf 'DEALLOCATE PREPARE zeroproduct;\n'

} >update_products.sql

# Good to have if this is transmitted remotely
sha512sum update_products.sql >update_products.sql.sha512sum

# can later check with:
sha512sum -c update_products.sql.sha512sum

From the provided sample csv, here is the content of update_products.sql:

PREPARE zeroproduct FROM 'UPDATE products SET quantity = 0 WHERE id = ?';
START TRANSACTION;
SET @id='29294848'; EXECUTE zeroproduct USING @id;
SET @id='29294849'; EXECUTE zeroproduct USING @id;
SET @id='29294850'; EXECUTE zeroproduct USING @id;
SET @id='29294851'; EXECUTE zeroproduct USING @id;
SET @id='29294853'; EXECUTE zeroproduct USING @id;
SET @id='29294857'; EXECUTE zeroproduct USING @id;
SET @id='29294858'; EXECUTE zeroproduct USING @id;
SET @id='29294860'; EXECUTE zeroproduct USING @id;
SET @id='29294861'; EXECUTE zeroproduct USING @id;
SET @id='29294863'; EXECUTE zeroproduct USING @id;
SET @id='29294887'; EXECUTE zeroproduct USING @id;
SET @id='29294888'; EXECUTE zeroproduct USING @id;
SET @id='29294889'; EXECUTE zeroproduct USING @id;
SET @id='29294890'; EXECUTE zeroproduct USING @id;
SET @id='29294891'; EXECUTE zeroproduct USING @id;
SET @id='29294892'; EXECUTE zeroproduct USING @id;
SET @id='29294895'; EXECUTE zeroproduct USING @id;
SET @id='29294897'; EXECUTE zeroproduct USING @id;
SET @id='29294898'; EXECUTE zeroproduct USING @id;
SET @id='29294899'; EXECUTE zeroproduct USING @id;
SET @id='29294901'; EXECUTE zeroproduct USING @id;
SET @id='29294903'; EXECUTE zeroproduct USING @id;
SET @id='29294912'; EXECUTE zeroproduct USING @id;
SET @id='29294916'; EXECUTE zeroproduct USING @id;
COMMIT;
DEALLOCATE PREPARE zeroproduct;

Upvotes: 0

Paul Hodges

Reputation: 15293

I'd probably start with

$: sed "s/ *= %ID%/ IN ( $(echo $(<ids.csv) ) )/" template.sql > update_products.sql

but if it's a lot of id's I'm not sure what your limits are, and I honestly don't remember whether that's an ANSI standard structure...

SO...

$: while IFS=, read -a ids
> do for id in ${ids[@]}
>    do echo "UPDATE products SET quantity = 0 WHERE id = $id;"
>    done
> done < ids.csv > update_products.sql
$: cat update_products.sql
UPDATE products SET quantity = 0 WHERE id = 29294848;
UPDATE products SET quantity = 0 WHERE id = 29294849;
UPDATE products SET quantity = 0 WHERE id = 29294850;
UPDATE products SET quantity = 0 WHERE id = 29294851;
UPDATE products SET quantity = 0 WHERE id = 29294853;
UPDATE products SET quantity = 0 WHERE id = 29294857;
UPDATE products SET quantity = 0 WHERE id = 29294858;
UPDATE products SET quantity = 0 WHERE id = 29294860;
UPDATE products SET quantity = 0 WHERE id = 29294861;
UPDATE products SET quantity = 0 WHERE id = 29294863;
UPDATE products SET quantity = 0 WHERE id = 29294887;
UPDATE products SET quantity = 0 WHERE id = 29294888;
UPDATE products SET quantity = 0 WHERE id = 29294889;
UPDATE products SET quantity = 0 WHERE id = 29294890;
UPDATE products SET quantity = 0 WHERE id = 29294891;
UPDATE products SET quantity = 0 WHERE id = 29294892;
UPDATE products SET quantity = 0 WHERE id = 29294895;
UPDATE products SET quantity = 0 WHERE id = 29294897;
UPDATE products SET quantity = 0 WHERE id = 29294898;
UPDATE products SET quantity = 0 WHERE id = 29294899;
UPDATE products SET quantity = 0 WHERE id = 29294901;
UPDATE products SET quantity = 0 WHERE id = 29294903;
UPDATE products SET quantity = 0 WHERE id = 29294912;
UPDATE products SET quantity = 0 WHERE id = 29294916;

Upvotes: 1

suspectus

Reputation: 17268

Don't need to use %ID% - the ids.txt only needs to be prefixed with SQL like so, writing the output to product_updates.sql output file:

awk -F, '{printf "%s (%s)\n", "UPDATE products SET quantity = 0 WHERE id IN ", $0}' ids.txt > product_updates.sql

Upvotes: 1

Using bash, awk or sed to templatize a CSV file into a SQL file

Answers (4)

Related Questions