Use multiline regex on cat output

Question

I've the following file queries.sql that contains a number of queries, structured like this:

/* Query 1 */
SELECT cab_type_id,
       Count(*)
FROM   trips
GROUP  BY 1;

/* Query 2 */
SELECT passenger_count,
       Avg(total_amount)
FROM   trips
GROUP  BY 1;

/* Query 3 */
SELECT passenger_count,
       Extract(year FROM pickup_datetime),
       Count(*)
FROM   trips
GROUP  BY 1,
          2;

Then I've written a regex, that finds all those queries in the file:

/\*[^\*]*\*/[^;]*;

What I'd like to achieve is the following:

Select all the queries with the regex.
Prefix each query with EXPLAIN ANALYZE
Execute each query and output the results to a new file. That means, query 1 will create a file q1.txt with the corresponding output, query 2 create q2.txt etc.

One of my main challenges (there are no problems, right? ;-)) is, that I'm rather unfamiliar with the linux bash I've to use.

I tried cat queries.sql | grep '/\*[^\*]*\*/[^;]*;' but that doesn't return anything.

So a solution could look like:

count = 0
for query in (cat queries.sql | grep 'somehow-here-comes-my-regex') do
    count = $count+1
    query = 'EXPLAIN ANALYZE '+query
    psql -U postgres -h localhost -d nyc-taxi-data -c query > 'q'$count'.txt'

Except from: that doesn't work and I don't know how to make it work.

marcell · Accepted Answer

You have to omit spaces for variable assignments.

The following script would help. Save it in a file eg.: explain.sh, make it executable using chmod 0700 explain.sh and run in the following way: ./explain.sh query.sql.

#!/bin/bash

qfile="$1"

# number of queries
n="$(grep -oP '(?<=Query )[0-9]+ ' $qfile)"

count=1
for q in $n; do
    # Corrected solution, modified after the remarks of @EdMorton
    qn="EXPLAIN ANALYZE $(awk -v n="Query $q" 'flag; $0 ~ n {flag=1} /;/{flag=0}' $qfile)"
    #qn="EXPLAIN ANALYZE $(awk -v n=$q "flag; /Query $q/{flag=1} /;/{flag=0}" $qfile)"
    # psql -U postgres -h localhost -d nyc-taxi-data -c "$qn" > q$count.txt
    echo "$qn" > q$count.txt
    count=$(( $count + 1 ))
done

First of all, the script accounts for one argument (your example input query.sql file). It reads out the number of queries and save into a variable n. Then in a for loop it iterates through the query numbers and uses awk to extract the number n query and append EXPLAIN ANALYZE to the beginning. Then you can run your psql with the desired query. Here I commented out the psql part. This example script only creates qN.txt files for each explain query.

UPDATE:

The awk part: It is possible to use a shell variable in awk using the -v flag. Here we creates an awk variable n with the value of the q shell variable. n is used to create the starter pattern ie: Query 1. awk -v n="Query $q" 'flag; $0 ~ n {flag=1} /;/{flag=0}' $qfile matches everything between Query 1 and the first occurence of a semi-colon (;) excluding the line of Query 1 from query.sql. The $(...) means command-substitution in bash, thus we can save the output of a shell command into a variable. Here we save the output of awk and prefix it with the EXPLAIN ANALYZE string.

Here is a great answer about awk pattern matching.

Use multiline regex on cat output

Answers (2)

Related Questions