Ipkiss
Ipkiss

Reputation: 791

How do i count number of times a specific data occurred in a raw, BigQuery Standard SQL

say i have this table,

| eta   | arrived  | time_diff |  
+-------+----------+-----------+
| 06:47 |    06:47 |    0      |
| 08:30 |    08:40 |    10     | 
| 10:30 |    10:40 |    10     |
| 10:30 |    10:31 |    1      | 
+-------+----------+-----------+
and i got the time_diff by TIME_DIFF(arrived , eta , MINUTE) as time_diff

what I wanted to do is to be able to count how many 0, 10 ... I have. ideally, the above table will yield one 0, two 10 and one 1. Offcorse i don't know in advance the time_diff result just wanted to count how many times a result occurred say i may have 2,3,5... how do I accomplish this in BigQuery Standard SQL?

Upvotes: 0

Views: 80

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

Below is for BigQuery Standard SQL

From practical standpoint, I would recommend grouping by bins: 0-9, 10-19, 20-29 and so on as it is in below example

#standardSQL
WITH `project.dataset.table` AS (
  SELECT '06:47' eta, '06:47' arrived  UNION ALL
  SELECT '08:30', '08:40' UNION ALL
  SELECT '10:30', '10:40' UNION ALL
  SELECT '10:30', '10:31'
)
SELECT FORMAT('%i - %i', bin, bin + 9) bin, cnt 
FROM (
  SELECT 
    10 * DIV(TIME_DIFF(PARSE_TIME('%R', arrived) , PARSE_TIME('%R', eta) , MINUTE), 10) bin,
    COUNT(1) cnt
  FROM `project.dataset.table`
  GROUP BY bin
)
ORDER BY bin   

with result

Row     bin         cnt  
1       0 - 9       2    
2       10 - 19     2      

in case if you need exact distribution per time_diff as is - you can use below

#standardSQL
WITH `project.dataset.table` AS (
  SELECT '06:47' eta, '06:47' arrived  UNION ALL
  SELECT '08:30', '08:40' UNION ALL
  SELECT '10:30', '10:40' UNION ALL
  SELECT '10:30', '10:31'
)
SELECT 
  TIME_DIFF(PARSE_TIME('%R', arrived) , PARSE_TIME('%R', eta) , MINUTE) diff,
  COUNT(1) cnt
FROM `project.dataset.table`
GROUP BY diff
ORDER BY diff  

with result as

Row     diff        cnt  
1       0           1    
2       1           1    
3       10          2    

Upvotes: 1

Eugene
Eugene

Reputation: 1533

You should use group by clause

Select time_diff , Count(*)
From [table]
Group by time_diff

Upvotes: 1

Related Questions