Reputation: 111
I have an order table uploaded to BigQuery that contains the following headers
ConsumerID, TransactionDate, Revenue, OrderID
ConsumerID & OrderID are Integers TransactionDate is a TIMESTAMP
The Data is Structured as Follows
ConsumerId || TransactionDate || Revenue || OrderID
1 || 2014-10-27 00:00:00 UTC || 55 || 653745
1 || 2015-02-27 00:00:00 UTC || 65 || 767833
1 || 2015-12-27 00:00:00 UTC || 456 || 5676324
2 || 2014-10-27 00:00:00 UTC || 56 || 435261
2 || 2016-02-27 00:00:00 UTC || 43 || 5632436724
So my expected output would be
ConsumerId || Count Of Orders In Last 12 months
1 || 2
2 || 1
I want to do a count of the number of orders a customer has placed in the 1st 12 months since the date of their first order.
In big query i have written the following
SELECT
ConsumerId,
COUNT(OrderNumber BETWEEN MIN(TransactionDate)AND DATE_ADD(MIN(TransactionDate),11,"MONTH")) AS CountOfOrdersTwelve,
FROM
[ordertable.orders]
GROUP BY
1,
2
ORDER BY
ConsumerId ;
However this errors with the following
Error: (L3:157): Cannot group by an aggregate.
Does anyone know a way this can be done in bigquery?
Upvotes: 1
Views: 360
Reputation: 173028
Quick option for you to consider (assuming input like below)
(SELECT 1 AS ConsumerID, '2014-01-01' AS TransactionDate, 1 AS OrderID),
(SELECT 1 AS ConsumerID, '2014-05-01' AS TransactionDate, 2 AS OrderID),
(SELECT 1 AS ConsumerID, '2015-01-01' AS TransactionDate, 3 AS OrderID),
(SELECT 1 AS ConsumerID, '2015-03-01' AS TransactionDate, 4 AS OrderID),
(SELECT 1 AS ConsumerID, '2015-04-01' AS TransactionDate, 5 AS OrderID),
(SELECT 1 AS ConsumerID, '2015-05-01' AS TransactionDate, 6 AS OrderID),
(SELECT 2 AS ConsumerID, '2015-01-01' AS TransactionDate, 1 AS OrderID),
(SELECT 2 AS ConsumerID, '2015-01-01' AS TransactionDate, 2 AS OrderID),
(SELECT 2 AS ConsumerID, '2015-01-01' AS TransactionDate, 3 AS OrderID),
(SELECT 2 AS ConsumerID, '2015-03-01' AS TransactionDate, 4 AS OrderID),
(SELECT 2 AS ConsumerID, '2015-04-01' AS TransactionDate, 5 AS OrderID),
(SELECT 2 AS ConsumerID, '2016-05-01' AS TransactionDate, 6 AS OrderID),
(SELECT 3 AS ConsumerID, '2015-04-01' AS TransactionDate, 1 AS OrderID),
(SELECT 3 AS ConsumerID, '2015-05-01' AS TransactionDate, 2 AS OrderID)
Your data can be different by data types so you will need adjust accordingly
SELECT ConsumerID, MAX(CountOfOrders) AS CountOfOrdersTwelve
FROM (
SELECT ConsumerID, CountOfOrders
FROM (
SELECT
ConsumerID, TransactionDate,
COUNT(1) OVER(PARTITION BY ConsumerID ORDER BY TransactionDate) AS CountOfOrders,
FIRST_VALUE(TransactionDate)
OVER(PARTITION BY ConsumerID ORDER BY TransactionDate) AS firstTransactionDate
FROM [ordertable.orders]
) HAVING DATEDIFF(TransactionDate, firstTransactionDate) <= 365
) GROUP BY ConsumerID ORDER BY ConsumerID
Compact version
Note: This version works with both STRING
(as in example for above first solution) and TIMESTAMP
(as in your updated question) datatype for TransactionDate
SELECT
ConsumerID, CountOfOrdersTwelve
FROM (
SELECT
ConsumerID,
TIMESTAMP_TO_SEC(TIMESTAMP(TransactionDate)) AS ts,
COUNT(ts) OVER (PARTITION BY ConsumerID ORDER BY ts
RANGE BETWEEN CURRENT ROW AND 365*24*3600 FOLLOWING) AS CountOfOrdersTwelve,
ROW_NUMBER() OVER(PARTITION BY ConsumerID ORDER BY ts) AS pos
FROM [ordertable.orders]
)
WHERE pos = 1
ORDER BY ConsumerID
Upvotes: 2