Javier C. H.
Javier C. H.

Reputation: 2123

Optimization of BigQuery aggregated data between two dates

I'm trying to compare the result of one query (example: number of rows), between the beginning of the current month and the current date, compared to what happened the previous month.

Example: Today is 25/01, so I'd like to know the number of rows created between the 01/01 and 25/01, vs the previous month (same interval) 01/12 and 25/12.

I'd like to retrieve it in one row, so that I can return the value of the current month, and the string : up/down depending on whether there have been more or less rows, compared to the previous month.

I've managed to get it working this way, but it looks too hacky and I'd like to know if there is a better approach (apart from retrieving two rows and process the result).

SELECT MAX(total_current) as current, IF(MAX(total_current) > MAX(total_previous), 'up', 'down') as status, 'Number of Items'
FROM
(SELECT INTEGER(count(*)) as total_current, INTEGER(0) as total_previous
FROM [mybucket.mytable] 
WHERE mydate BETWEEN TIMESTAMP(STRFTIME_UTC_USEC(CURRENT_TIMESTAMP(), "%Y-%m-01")) and CURRENT_TIMESTAMP()),
(SELECT INTEGER(count(*)) as total_previous, INTEGER(0) as total_current
FROM [mybucket.mytable] 
WHERE mydate 
BETWEEN DATE_ADD(TIMESTAMP(STRFTIME_UTC_USEC(CURRENT_TIMESTAMP(), "%Y-%m-01")), -1, 'MONTH') 
AND DATE_ADD(CURRENT_TIMESTAMP(), -1, 'MONTH'))

Does it make sense, or is absolutely wrong? If so, how could I improve it, or it's just that this kind of things aren't supposed to be done in a query.

Upvotes: 2

Views: 2367

Answers (3)

steven.levey
steven.levey

Reputation: 896

I modified something I use to run Current Month To Date vs Previous Month To Date comparisons based on Total Sales. So 1 Feb - 25 Feb(Current DateTime) vs 1 Jan to 25 Jan. You can adapt the code below to help.

SELECT STRING(thisMonth) as ThisMonth,STRING(lastMonth) as LastMonth, IF(thisMonth <       lastMonth,"down","up") as Status
FROM
  (SELECT SUM(this_month)as thisMonth, SUM(last_month)as lastMonth
  FROM
    (Select SUM(CASE WHEN MONTH(Sales_Date) = MONTH(CURRENT_TIMESTAMP()) THEN  FLOAT(Unit_Price) ELSE 0 END) this_month, SUM(CASE WHEN MONTH(Sales_Date) =  MONTH(DATE_ADD(CURRENT_TIMESTAMP(),-1,"MONTH")) THEN FLOAT(Unit_Price) ELSE 0 END)  last_month
    FROM [mybucket.mytable]
    Where Sales_Date >= DATE_ADD(TIMESTAMP(CURRENT_DATE()),-2,"MONTH") AND  DAY(Sales_Date) <= DAY(TIMESTAMP(CONCAT(CURRENT_DATE(),"23:59:59"))) 
    ))

Notes:

I add in the DATE_ADD with the -2, "Month" in the WHERE statement to make sure I include the whole of the last month. If you just say -1 then you can miss certain days... I add in the 23:23:59 because I have a Time attached to my Date so I want to make sure I include the entire day. I convert my final display into a STRING because BigQuery returns some FLOAT values in Scientific Notation and that can look confusing. You may not have that issue.

Upvotes: 0

user275683
user275683

Reputation:

Ok lets try again, not that I think I understand what you want. This is somewhat cleaner. Also I included case statement to check if Current and Previous are Equal

DECLARE @dayInt INTEGER = 18;

SELECT COUNT(CASE WHEN MONTH(GETDATE()) = MONTH(d.Date_Dt) THEN 1
                  ELSE NULL
             END) AS 'Current'
       ,CASE WHEN COUNT(CASE WHEN MONTH(GETDATE()) = MONTH(d.Date_Dt) THEN 1
                             ELSE NULL
                        END) > COUNT(CASE WHEN MONTH(GETDATE()) <> MONTH(d.Date_Dt) THEN 1
                                          ELSE NULL
                                     END) THEN 'UP'
             WHEN COUNT(CASE WHEN MONTH(GETDATE()) = MONTH(d.Date_Dt) THEN 1
                             ELSE NULL
                        END) = COUNT(CASE WHEN MONTH(GETDATE()) <> MONTH(d.Date_Dt) THEN 1
                                          ELSE NULL
                                     END) THEN 'Equal'
             ELSE 'Down'
        END AS 'Status'
       ,'Number of Items'
    FROM dbo.Date AS d
    WHERE DAY(d.Date_Dt) <= @dayInt
        AND ( d.Date_Dt BETWEEN DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()) - 1, 0) AND GETDATE() )

Upvotes: 0

Javier C. H.
Javier C. H.

Reputation: 2123

I've managed to optimise/simplify the query as follows:

SELECT TOP(LEFT(DATE(mydate), 7), 2) as month, count(*) as total
FROM [mybucket.mytable]
WHERE DAY(mydate) BETWEEN 1 and DAY(CURRENT_TIMESTAMP())
AND LEFT(DATE(mydate), 7) >= LEFT(DATE(DATE_ADD(CURRENT_TIMESTAMP(), -1, 'MONTH')), 7);

However, I still would like to get only one row with the result of the current month and up/down compared to the previous month.

For example if the query returns this:

1   2013-12 48946    
2   2014-01 40497

In that case I'd like to get the following row: 1 40497 'down' (because previous month's value was higher).
Is there any way to do it? Thanks

Upvotes: 4

Related Questions