hotmeatballsoup
hotmeatballsoup

Reputation: 605

Using bash, awk or sed to templatize a CSV file into a SQL file

I will have a CSV file (say, ids.csv) that I need to ETL into a SQL script (say, update_products.sql). The CSV will be headerless and will consist of comma-delimited numbers (product IDs in a database), for instance:

29294848,29294849,29294850,29294851,29294853,29294857,29294858,29294860,29294861,29294863,29294887,29294888,
29294889,29294890,29294891,29294892,29294895,29294897,29294898,29294899,29294901,29294903,29294912,29294916

Starting with a SQL "template" file (template.sql) that looks something like this:

UPDATE products SET quantity = 0 WHERE id = %ID%;

I'm looking for a way via bash, awk, sed (or any other type of shell scripting tool), to templatize %IDS% with the values in the CSV, hence turning the generated SQL into something like:

UPDATE products SET quantity = 0 WHERE id = 29294848;
UPDATE products SET quantity = 0 WHERE id = 29294849;
UPDATE products SET quantity = 0 WHERE id = 29294850;
... etc, for all the IDs in the CSV...

Super flexible here:

Any ideas how I might be able to accomplish this?

Upvotes: 1

Views: 881

Answers (4)

David C. Rankin
David C. Rankin

Reputation: 84561

In addition to the answer by @suspectus which provides a nice use of printf to output each line wanted, a slightly more procedural use of awk incorporating a for loop over the fields would be:

awk -F, '{
    for (i=1;i<=NF;i++)
        print "UPDATE products SET quantity = 0 WHERE id = " $i ";"
}' file.csv

Where the single rule simply loops over each of the comma-separated fields using string-concatenation to form the desired output. In detail the awk command:

  • awk -F, sets the field-separator (FS) equal to a comma to split the input,
  • for (i=1;i<=NF;i++) simply loops over each field, and
  • print "UPDATE products SET quantity = 0 WHERE id = " $i ";" outputs the wanted text incorporating the field within using string-concatenation.

Example Use/Output

With your data in file.csv (presumed to be a single line, but it really doesn't matter) your output would be:

$ awk -F, '{
>     for (i=1;i<=NF;i++)
>         print "UPDATE products SET quantity = 0 WHERE id = " $i ";"
> }' file.csv
UPDATE products SET quantity = 0 WHERE id = 29294848;
UPDATE products SET quantity = 0 WHERE id = 29294849;
UPDATE products SET quantity = 0 WHERE id = 29294850;
UPDATE products SET quantity = 0 WHERE id = 29294851;
UPDATE products SET quantity = 0 WHERE id = 29294853;
UPDATE products SET quantity = 0 WHERE id = 29294857;
UPDATE products SET quantity = 0 WHERE id = 29294858;
UPDATE products SET quantity = 0 WHERE id = 29294860;
UPDATE products SET quantity = 0 WHERE id = 29294861;
UPDATE products SET quantity = 0 WHERE id = 29294863;
UPDATE products SET quantity = 0 WHERE id = 29294887;
UPDATE products SET quantity = 0 WHERE id = 29294888;
UPDATE products SET quantity = 0 WHERE id = 29294889;
UPDATE products SET quantity = 0 WHERE id = 29294890;
UPDATE products SET quantity = 0 WHERE id = 29294891;
UPDATE products SET quantity = 0 WHERE id = 29294892;
UPDATE products SET quantity = 0 WHERE id = 29294895;
UPDATE products SET quantity = 0 WHERE id = 29294897;
UPDATE products SET quantity = 0 WHERE id = 29294898;
UPDATE products SET quantity = 0 WHERE id = 29294899;
UPDATE products SET quantity = 0 WHERE id = 29294901;
UPDATE products SET quantity = 0 WHERE id = 29294903;
UPDATE products SET quantity = 0 WHERE id = 29294912;
UPDATE products SET quantity = 0 WHERE id = 29294916;

Look things over and let me know if you have further questions.

Upvotes: 0

L&#233;a Gris
L&#233;a Gris

Reputation: 19555

I propose to be safe rather than sorry.

May be deemed pedantic, but working with business database is serious matter.

So here it is based on @Paul Hodges's answer

#!/usr/bin/env bash

{
  # Use the prepared statements `zeroproduct` 
  # to protect against SQL injections
  printf 'PREPARE zeroproduct FROM '\''%s'\'';\n' \
    'UPDATE products SET quantity = 0 WHERE id = ?'

  # Work inside a transaction, so if something goes wrong,
  # like the sql file is incomplete, it can be rolled-back.
  printf 'START TRANSACTION;\n'

  while IFS=, read -r -a ids; do
    for id in "${ids[@]}"; do
      # Set the value of the @id argument in SQL
      # And execute the SQL statement with the @id argument
      # that will replace the '?'
      printf 'SET @id='\''%8d'\''; EXECUTE zeroproduct USING @id;\n' \
        "$((id))" # Ensure id is an integer
    done
  done <ids.csv

  # Now commit all these changes since we are finally here
  printf 'COMMIT;\n'

  # Deallocate the prepared statement once we are done
  printf 'DEALLOCATE PREPARE zeroproduct;\n'

} >update_products.sql

# Good to have if this is transmitted remotely
sha512sum update_products.sql >update_products.sql.sha512sum

# can later check with:
sha512sum -c update_products.sql.sha512sum

From the provided sample csv, here is the content of update_products.sql:

PREPARE zeroproduct FROM 'UPDATE products SET quantity = 0 WHERE id = ?';
START TRANSACTION;
SET @id='29294848'; EXECUTE zeroproduct USING @id;
SET @id='29294849'; EXECUTE zeroproduct USING @id;
SET @id='29294850'; EXECUTE zeroproduct USING @id;
SET @id='29294851'; EXECUTE zeroproduct USING @id;
SET @id='29294853'; EXECUTE zeroproduct USING @id;
SET @id='29294857'; EXECUTE zeroproduct USING @id;
SET @id='29294858'; EXECUTE zeroproduct USING @id;
SET @id='29294860'; EXECUTE zeroproduct USING @id;
SET @id='29294861'; EXECUTE zeroproduct USING @id;
SET @id='29294863'; EXECUTE zeroproduct USING @id;
SET @id='29294887'; EXECUTE zeroproduct USING @id;
SET @id='29294888'; EXECUTE zeroproduct USING @id;
SET @id='29294889'; EXECUTE zeroproduct USING @id;
SET @id='29294890'; EXECUTE zeroproduct USING @id;
SET @id='29294891'; EXECUTE zeroproduct USING @id;
SET @id='29294892'; EXECUTE zeroproduct USING @id;
SET @id='29294895'; EXECUTE zeroproduct USING @id;
SET @id='29294897'; EXECUTE zeroproduct USING @id;
SET @id='29294898'; EXECUTE zeroproduct USING @id;
SET @id='29294899'; EXECUTE zeroproduct USING @id;
SET @id='29294901'; EXECUTE zeroproduct USING @id;
SET @id='29294903'; EXECUTE zeroproduct USING @id;
SET @id='29294912'; EXECUTE zeroproduct USING @id;
SET @id='29294916'; EXECUTE zeroproduct USING @id;
COMMIT;
DEALLOCATE PREPARE zeroproduct;

Upvotes: 0

Paul Hodges
Paul Hodges

Reputation: 15293

I'd probably start with

$: sed "s/ *= %ID%/ IN ( $(echo $(<ids.csv) ) )/" template.sql > update_products.sql

but if it's a lot of id's I'm not sure what your limits are, and I honestly don't remember whether that's an ANSI standard structure...

SO...

$: while IFS=, read -a ids
> do for id in ${ids[@]}
>    do echo "UPDATE products SET quantity = 0 WHERE id = $id;"
>    done
> done < ids.csv > update_products.sql
$: cat update_products.sql
UPDATE products SET quantity = 0 WHERE id = 29294848;
UPDATE products SET quantity = 0 WHERE id = 29294849;
UPDATE products SET quantity = 0 WHERE id = 29294850;
UPDATE products SET quantity = 0 WHERE id = 29294851;
UPDATE products SET quantity = 0 WHERE id = 29294853;
UPDATE products SET quantity = 0 WHERE id = 29294857;
UPDATE products SET quantity = 0 WHERE id = 29294858;
UPDATE products SET quantity = 0 WHERE id = 29294860;
UPDATE products SET quantity = 0 WHERE id = 29294861;
UPDATE products SET quantity = 0 WHERE id = 29294863;
UPDATE products SET quantity = 0 WHERE id = 29294887;
UPDATE products SET quantity = 0 WHERE id = 29294888;
UPDATE products SET quantity = 0 WHERE id = 29294889;
UPDATE products SET quantity = 0 WHERE id = 29294890;
UPDATE products SET quantity = 0 WHERE id = 29294891;
UPDATE products SET quantity = 0 WHERE id = 29294892;
UPDATE products SET quantity = 0 WHERE id = 29294895;
UPDATE products SET quantity = 0 WHERE id = 29294897;
UPDATE products SET quantity = 0 WHERE id = 29294898;
UPDATE products SET quantity = 0 WHERE id = 29294899;
UPDATE products SET quantity = 0 WHERE id = 29294901;
UPDATE products SET quantity = 0 WHERE id = 29294903;
UPDATE products SET quantity = 0 WHERE id = 29294912;
UPDATE products SET quantity = 0 WHERE id = 29294916;

Upvotes: 1

suspectus
suspectus

Reputation: 17268

Don't need to use %ID% - the ids.txt only needs to be prefixed with SQL like so, writing the output to product_updates.sql output file:

awk -F, '{printf "%s (%s)\n", "UPDATE products SET quantity = 0 WHERE id IN ", $0}' ids.txt > product_updates.sql

Upvotes: 1

Related Questions