DarkLeafyGreen
DarkLeafyGreen

Reputation: 70466

Query for negative account balance period in bigquery

I am playing around with bigquery and hit an interesting use case. I have a collection of customers and account balances. The account balances collection records any account balance change.

Customers:

+---------+--------+
|    ID   |  Name  |
+---------+--------+
| 1       | Alice  |
| 2       | Bob    |
+---------+--------+

Accounts balances:

+---------+---------------+---------+------------+
|    ID   |  customer_id  |  value  | timestamp  |
+---------+---------------+---------+------------+
| 1       | 1             |  -500   | 2019-02-12 |
| 2       | 1             |  -200   | 2019-02-10 |
| 3       | 2             |  200    | 2019-02-10 |
| 4       | 1             |  0      | 2019-02-09 |
+---------+---------------+---------+------------+

The goal is to find out, for how long a customer has a negative account balance. The resulting collection would look like this:

+---------+--------+---------------------------------+
|    ID   |  Name  |  Negative account balance since |
+---------+--------+---------------------------------+
| 1       | Alice  |  2 days                         |
+---------+--------+---------------------------------+

Bob is not in the collection, because his last account record shows a positive value.

I think following steps are involved:

Is something like this even possible in sql? Do you have any ideas on who to create such query? To get customers that currently have a negative account balance, I use this query:

SELECT customer_id FROM (
  SELECT t.account_balance, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY timestamp DESC) as seqnum FROM `account_balances` t
) t
WHERE seqnum = 1 AND account_balance<0

Upvotes: 1

Views: 476

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173171

Below is for BigQuery Standard SQL

#standardSQL
SELECT customer_id, name,
  SUM(IF(negative_positive < 0, days, 0)) negative_days,
  SUM(IF(negative_positive = 0, days, 0)) zero_days,
  SUM(IF(negative_positive > 0, days, 0)) positive_days
FROM (
  SELECT customer_id, negative_positive, grp,
    1 + DATE_DIFF(MAX(ts), MIN(ts), DAY) days
  FROM (
    SELECT customer_id, ts, SIGN(value) negative_positive, 
    COUNTIF(flag) OVER(PARTITION BY customer_id ORDER BY ts) grp
    FROM (
      SELECT *, SIGN(value) = IFNULL(LEAD(SIGN(value)) OVER(PARTITION BY customer_id ORDER BY ts), 0) flag
      FROM `project.dataset.balances`
    )
  )
  GROUP BY customer_id, negative_positive, grp
)
LEFT JOIN `project.dataset.customers`
ON id = customer_id
GROUP BY customer_id, name

You can test, play with above using sample data from your question as in below example

#standardSQL
WITH `project.dataset.balances` AS (
  SELECT 1 customer_id, -500 value, DATE '2019-02-12' ts UNION ALL
  SELECT 1, -200, '2019-02-10' UNION ALL
  SELECT 2, 200, '2019-02-10' UNION ALL
  SELECT 1, 0, '2019-02-09' 
), `project.dataset.customers` AS (
  SELECT 1 id, 'Alice' name UNION ALL
  SELECT 2, 'Bob' 
)
SELECT customer_id, name,
  SUM(IF(negative_positive < 0, days, 0)) negative_days,
  SUM(IF(negative_positive = 0, days, 0)) zero_days,
  SUM(IF(negative_positive > 0, days, 0)) positive_days
FROM (
  SELECT customer_id, negative_positive, grp,
    1 + DATE_DIFF(MAX(ts), MIN(ts), DAY) days
  FROM (
    SELECT customer_id, ts, SIGN(value) negative_positive, 
    COUNTIF(flag) OVER(PARTITION BY customer_id ORDER BY ts) grp
    FROM (
      SELECT *, SIGN(value) = IFNULL(LEAD(SIGN(value)) OVER(PARTITION BY customer_id ORDER BY ts), 0) flag
      FROM `project.dataset.balances`
    )
  )
  GROUP BY customer_id, negative_positive, grp
)
LEFT JOIN `project.dataset.customers`
ON id = customer_id
GROUP BY customer_id, name
-- ORDER BY customer_id

with result

Row customer_id name    negative_days   zero_days   positive_days    
1   1           Alice   3               1           0    
2   2           Bob     0               0           1    

Upvotes: 2

Related Questions