theresemoreau
theresemoreau

Reputation: 109

bigquery, is a "subtable" possible?

In bigquery using legacy sql I have created a monstrous query that returns the following display of visits per day for a site that I released 2018-02-26:

Row  date       name    release_date  visits_count
1    20180226   a_name  20180226      2179
2    20180227   a_name  20180226      9522
3    20180228   a_name  20180226      1593   
4    20180301   a_name  20180226      300    
...

What I really want is

Row  name    release   count_release  count_release+1  count_release_rest
1    a_name  20180226  2179           9522             1893  

Thus, I want the actual visit count for release date, the day after the release date and all counts after that should just be summed. I'm new to bigquery (and kind of new to sql...). Is there a way to define my first display as a "subtable" or something like that so that I can do this or what approach would you recommend?

Upvotes: 0

Views: 777

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173028

Below is for BigQuery Standard SQL

#standardSQL
WITH `project.dataset.table` AS (
  SELECT '20180226' date, 'a_name' name, '20180226' release_date, 2179 visits_count UNION ALL
  SELECT '20180227', 'a_name', '20180226', 9522 UNION ALL
  SELECT '20180228', 'a_name', '20180226', 1593 UNION ALL   
  SELECT '20180301', 'a_name', '20180226', 300  
)
SELECT name, release_date, 
  SUM(CASE WHEN date = release_date THEN visits_count END) count_release,
  SUM(CASE WHEN PARSE_DATE('%Y%m%d', date) = DATE_ADD(PARSE_DATE('%Y%m%d', release_date), INTERVAL 1 DAY) THEN visits_count END) count_release_next_day,
  SUM(CASE WHEN PARSE_DATE('%Y%m%d', date) > DATE_ADD(PARSE_DATE('%Y%m%d', release_date), INTERVAL 1 DAY) THEN visits_count END) count_release_rest
FROM `project.dataset.table`
GROUP BY name, release_date   

or above can be "refactored" to avoid repeating PARSE_DATE, so query looks more compact and easier to manage

#standardSQL
WITH `project.dataset.table` AS (
  SELECT '20180226' date, 'a_name' name, '20180226' release_date, 2179 visits_count UNION ALL
  SELECT '20180227', 'a_name', '20180226', 9522 UNION ALL
  SELECT '20180228', 'a_name', '20180226', 1593 UNION ALL   
  SELECT '20180301', 'a_name', '20180226', 300  
)
SELECT name, release_date, 
  SUM(CASE WHEN date = release_date THEN visits_count END) count_release,
  SUM(CASE WHEN visit = release_next_day THEN visits_count END) count_release_next_day,
  SUM(CASE WHEN visit > release_next_day THEN visits_count END) count_release_rest
FROM `project.dataset.table`, 
UNNEST([STRUCT<visit DATE, release_next_day DATE>(
  PARSE_DATE('%Y%m%d', date), 
  DATE_ADD(PARSE_DATE('%Y%m%d', release_date), INTERVAL 1 DAY))]) x
GROUP BY name, release_date      

in both cases result is

Row name    release_date    count_release   count_release_next_day  count_release_rest   
1   a_name  20180226        2179            9522                    1893     

Upvotes: 0

Bharat Mishra
Bharat Mishra

Reputation: 91

There are lot of ways you can achieve this functionality. The simplest way to do it is compare date with case statement.

select name, sum(case when date = relese_date then 1 else 0) as release_count, 
sum(case when date = DATE_ADD(relese_date,1,"DAY") then 1 else 0) as release_count1
sum(case when date > DATE_ADD(relese_date,1,"DAY") then 1 else 0) as release_count_other

Upvotes: 1

Related Questions