Matthew
Matthew

Reputation: 411

how to calculate difference between dates in BigQuery

I have a table named Employees with Columns: PersonID, Name, StartDate. I want to calculate 1) difference in days between the newest and oldest employee and 2) the longest period of time (in days) without any new hires. I have tried to use DATEDIFF, however the dates are in a single column and I'm not sure what other method I should use. Any help would be greatly appreciated

Upvotes: 0

Views: 5279

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Below is for BigQuery Standard SQL

#standardSQL
SELECT 
  SUM(days_before_next_hire) AS days_between_newest_and_oldest_employee,
  MAX(days_before_next_hire) - 1 AS longest_period_without_new_hire
FROM (
  SELECT 
    DATE_DIFF(
      StartDate, 
      LAG(StartDate) OVER(ORDER BY StartDate), 
      DAY
    ) days_before_next_hire
  FROM `project.dataset.your_table`
)   

You can test, play with above using dummy data as in the example below

#standardSQL
WITH `project.dataset.your_table` AS (
  SELECT DATE '2019-01-01' StartDate UNION ALL
  SELECT '2019-01-03' StartDate UNION ALL
  SELECT '2019-01-13' StartDate 
)
SELECT 
  SUM(days_before_next_hire) AS days_between_newest_and_oldest_employee,
  MAX(days_before_next_hire) - 1 AS longest_period_without_new_hire
FROM (
  SELECT 
    DATE_DIFF(
      StartDate, 
      LAG(StartDate) OVER(ORDER BY StartDate), 
      DAY
    ) days_before_next_hire
  FROM `project.dataset.your_table`
)   

with result

Row days_between_newest_and_oldest_employee longest_period_without_new_hire  
1   12                                      9       

Note use of -1 in calculating longest_period_without_new_hire - it is really up to you to use this adjustment or not depends on your preferences of counting gaps

Upvotes: 1

Felipe Hoffa
Felipe Hoffa

Reputation: 59175

1) difference in days between the newest and oldest record

WITH table AS (
  SELECT DATE(created_at) date, *
  FROM `githubarchive.day.201901*` 
  WHERE _table_suffix<'2'
  AND repo.name = 'google/bazel-common'
  AND type='ForkEvent'
)

SELECT DATE_DIFF(MAX(date), MIN(date),  DAY) max_minus_min
FROM table

2) the longest period of time (in days) without any new records

WITH table AS (
  SELECT DATE(created_at) date, *
  FROM `githubarchive.day.201901*` 
  WHERE _table_suffix<'2'
  AND repo.name = 'google/bazel-common'
  AND type='ForkEvent'
)

SELECT MAX(diff) max_diff
FROM (
  SELECT DATE_DIFF(date, LAG(date) OVER(ORDER BY date), DAY) diff
  FROM table
)

Upvotes: 0

Related Questions