John Mitchell
John Mitchell

Reputation: 111

BigQuery Count Instances between timeperiod Group By

I have an order table uploaded to BigQuery that contains the following headers

ConsumerID, TransactionDate, Revenue, OrderID

ConsumerID & OrderID are Integers TransactionDate is a TIMESTAMP

The Data is Structured as Follows

ConsumerId   || TransactionDate          || Revenue   ||  OrderID
1            || 2014-10-27 00:00:00 UTC  || 55        ||  653745
1            || 2015-02-27 00:00:00 UTC  || 65        ||  767833
1            || 2015-12-27 00:00:00 UTC  || 456       ||  5676324
2            || 2014-10-27 00:00:00 UTC  || 56        ||  435261
2            || 2016-02-27 00:00:00 UTC  || 43        ||  5632436724

So my expected output would be

ConsumerId   || Count Of Orders In Last 12 months
    1        || 2
    2        || 1

I want to do a count of the number of orders a customer has placed in the 1st 12 months since the date of their first order.

In big query i have written the following

SELECT
  ConsumerId,
  COUNT(OrderNumber BETWEEN MIN(TransactionDate)AND DATE_ADD(MIN(TransactionDate),11,"MONTH")) AS CountOfOrdersTwelve,
FROM
  [ordertable.orders]
GROUP BY
  1,
  2
ORDER BY
  ConsumerId ;

However this errors with the following

Error: (L3:157): Cannot group by an aggregate.

Does anyone know a way this can be done in bigquery?

Upvotes: 1

Views: 360

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173028

Quick option for you to consider (assuming input like below)

      (SELECT 1 AS ConsumerID, '2014-01-01' AS TransactionDate, 1 AS OrderID),
      (SELECT 1 AS ConsumerID, '2014-05-01' AS TransactionDate, 2 AS OrderID),
      (SELECT 1 AS ConsumerID, '2015-01-01' AS TransactionDate, 3 AS OrderID),
      (SELECT 1 AS ConsumerID, '2015-03-01' AS TransactionDate, 4 AS OrderID),
      (SELECT 1 AS ConsumerID, '2015-04-01' AS TransactionDate, 5 AS OrderID),
      (SELECT 1 AS ConsumerID, '2015-05-01' AS TransactionDate, 6 AS OrderID),

      (SELECT 2 AS ConsumerID, '2015-01-01' AS TransactionDate, 1 AS OrderID),
      (SELECT 2 AS ConsumerID, '2015-01-01' AS TransactionDate, 2 AS OrderID),
      (SELECT 2 AS ConsumerID, '2015-01-01' AS TransactionDate, 3 AS OrderID),
      (SELECT 2 AS ConsumerID, '2015-03-01' AS TransactionDate, 4 AS OrderID),
      (SELECT 2 AS ConsumerID, '2015-04-01' AS TransactionDate, 5 AS OrderID),
      (SELECT 2 AS ConsumerID, '2016-05-01' AS TransactionDate, 6 AS OrderID),

      (SELECT 3 AS ConsumerID, '2015-04-01' AS TransactionDate, 1 AS OrderID),
      (SELECT 3 AS ConsumerID, '2015-05-01' AS TransactionDate, 2 AS OrderID)

Your data can be different by data types so you will need adjust accordingly

SELECT ConsumerID, MAX(CountOfOrders) AS CountOfOrdersTwelve
FROM (
  SELECT ConsumerID, CountOfOrders
  FROM (
    SELECT
      ConsumerID, TransactionDate,
      COUNT(1) OVER(PARTITION BY ConsumerID ORDER BY TransactionDate) AS CountOfOrders,
      FIRST_VALUE(TransactionDate) 
        OVER(PARTITION BY ConsumerID ORDER BY TransactionDate) AS firstTransactionDate
    FROM [ordertable.orders]
  ) HAVING DATEDIFF(TransactionDate, firstTransactionDate) <= 365
) GROUP BY ConsumerID ORDER BY ConsumerID

Compact version

Note: This version works with both STRING (as in example for above first solution) and TIMESTAMP (as in your updated question) datatype for TransactionDate

SELECT 
  ConsumerID, CountOfOrdersTwelve
FROM (
  SELECT 
    ConsumerID,
    TIMESTAMP_TO_SEC(TIMESTAMP(TransactionDate)) AS ts,
    COUNT(ts) OVER (PARTITION BY ConsumerID ORDER BY ts 
      RANGE BETWEEN CURRENT ROW AND 365*24*3600 FOLLOWING) AS CountOfOrdersTwelve,
    ROW_NUMBER() OVER(PARTITION BY ConsumerID ORDER BY ts) AS pos
  FROM [ordertable.orders]
)
WHERE pos = 1
ORDER BY ConsumerID

Upvotes: 2

Related Questions