JamesM
JamesM

Reputation: 81

Generalising a parameterized query

I've created (with help from StackOverflow!) a reasonably complicated SQL query to perform cohort analysis on how purchasing a particular product affects revenue.

I created it to be parameterized, so that the query can be passed ${PRODUCT} as the product to evaluate and it'll generate the report.

SELECT 
  DATE_DIFF(PARSE_DATE("%Y-%m", month), PARSE_DATE("%Y-%m", earliest_use), MONTH) as nplus,
  CASE WHEN ProductName = '${PRODUCT}' THEN '${PRODUCT}' ELSE 'Other' END as product,
  CustomerInfo,
  AVG(net_revenue) as avg_net_revenue,
  COUNT(DISTINCT CustomerID) as customers
FROM (
SELECT 
  month,
  rev.CustomerID,
  ProductName,
  customers.CustomerInfo,
  SUM(Revenue) AS net_revenue,
  MIN(MIN(CASE WHEN ProductName = '${PRODUCT}' THEN month END)) OVER (PARTITION BY rev.CustomerID) AS earliest_use 
FROM
  RevenueTable as rev,
  CustomerTable AS customers 
WHERE
  customers.CustomerID = rev.CustomerID
  AND rev.CustomerID IN (
      SELECT DISTINCT CustomerID
      FROM RevenueTable
      WHERE ProductName = '${PRODUCT}'
    )
GROUP BY 1,2,3,4
)
GROUP BY nplus, product, CustomerInfo
HAVING nplus >= -7
ORDER BY nplus

However, I now want to create a generic version that simple iterates this query over ALL products.

Any help would be much appreciated.

Upvotes: 0

Views: 82

Answers (1)

Noe Romero
Noe Romero

Reputation: 400

If you are using CLI to run your parameterized query, you could use an script like:

bq query --format=csv --use_legacy_sql=false  'SELECT product_desc FROM `PROJECT.DATASET.products_list`' | xargs -I {} bq query --use_legacy_sql=false --parameter=product::{} 'SELECT fact_id, product_desc FROM  `PROJECT.DATASET.product_fact_table` WHERE product_desc = @product'

In this script, I'm querying the name of the products from a table named "products_list" and then passing each one to the query that get all the operations related to a specific product in the table called "product_fact_table". The result looks like:

script result

On the other hand, if you are using the API or some of the client libraries (Go, Java, Node.js or Python), I think you only need to build a loop according with the programming language you are using.

The BigQuery UI currently doesn't support parameterized queries.

In addition, I would like to share with you an additional alternative using Scripting:

DECLARE count INT64 default 0;
DECLARE target_product STRING;
DECLARE products_array ARRAY<STRING>;
DECLARE number_of_products INT64;

SET number_of_products = (
WITH products AS
  (SELECT ARRAY (select product_desc from `PROJECT_ID.DATASET.products_list`) as products_result)
SELECT ARRAY_LENGTH(products_result) AS len
FROM products);

SET products_array = (
      SELECT ARRAY_AGG(product_desc)
      FROM `PROJECT_ID.DATASET.products_list`
    ); 

LOOP  
  SET target_product = products_array[OFFSET(count)];
  SELECT * FROM `PROJECT_ID.DATASET.product_fact_table` WHERE product_desc=target_product;
  SET count = count + 1;
  IF count >= number_of_products THEN
    LEAVE;
  END IF;
END LOOP;

Here, I'm declaring the "count" variable and using it to count the number of iterations processed, the "target_product" variable contains the name of the product on each iteration, the "products_array" is set with all the products from a query and, finally, the "number_of_products" contains the total of products and it is used to limit the number of loop iterations.

Upvotes: 1

Related Questions