Saurabh Dhage
Saurabh Dhage

Reputation: 115

percentage difference of two values on same cloumn

I want to get the percentage difference between the two values from the avg column to be shown in the different column as the variance. following example is of dummy table as the data in the table is large. The values not necessarily to be the same.

The first value each resource id is the current month value while other is the average of all the previous months. This result is obtained from 'union all' from two queries. I want to calculate the percentage increase/decrease in avg value of each resource id

Table

resource_id avg
----------- -------
101         20
101         15
102         18
102         20

The results I need

resource_id variance
----------- ---------
101         25
102        -10

The original query from which I get the table.

SELECT resource_id,
       avg
FROM   (SELECT resource_id,
               Avg(sum_cost) AS AVG
        FROM  (SELECT resource_id,
                      month,
                      Sum(sum_cost)AS SUM_COST
               FROM   (SELECT billing_bigquery_data.labels_value         AS
                              resource_id
                              ,
                              Extract(month FROM Date(
                                      usage_start_time)) AS MONTH,
                              Sum(cost)                                  AS
                              sum_cost
                       FROM   PUBLIC.billing_bigquery_data
                              INNER JOIN PUBLIC.conf_data_history
                                      ON billing_bigquery_data.labels_value =
                                         conf_data_history.resource_id
                       WHERE  conf_data_history.metric_name IN( 'VOLUME_TYPE' )
                              AND labels_key IN( 'disk_id', 'instance_id' )
                              AND Extract(month FROM Date(usage_start_time)) !=
                                  '12'
                       GROUP  BY conf_data_history.metric_value,
                                 billing_bigquery_data.labels_value,
                                 billing_bigquery_data.usage_start_time)AS test
               GROUP  BY month,
                         resource_id
               ORDER  BY resource_id) AS test
        GROUP  BY resource_id
        UNION ALL
        SELECT resource_id,
               Avg(sum_cost) AS AVG
        FROM  (SELECT resource_id,
                      month,
                      Sum(sum_cost)AS SUM_COST
               FROM   (SELECT billing_bigquery_data.labels_value         AS
                              resource_id
                              ,
                              Extract(month FROM Date(
                                      usage_start_time)) AS MONTH,
                              Sum(cost)                                  AS
                              sum_cost
                       FROM   PUBLIC.billing_bigquery_data
                              INNER JOIN PUBLIC.conf_data_history
                                      ON billing_bigquery_data.labels_value =
                                         conf_data_history.resource_id
                       WHERE  conf_data_history.metric_name IN( 'VOLUME_TYPE' )
                              AND labels_key IN( 'disk_id', 'instance_id' )
                              AND Extract(month FROM Date(usage_start_time)) =
                                  '12'
                       GROUP  BY conf_data_history.metric_value,
                                 billing_bigquery_data.labels_value,
                                 billing_bigquery_data.usage_start_time)AS test
               GROUP  BY month,
                         resource_id
               ORDER  BY resource_id) AS test
        GROUP  BY resource_id) tt
ORDER  BY resource_id 

Upvotes: 2

Views: 557

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269513

Your question assumes an ordering to the rows. However, SQL tables represent unordered sets. So, there is no way to answer your question, unless a column as this orders.

If there is such a column, then you can use that. One method would be:

select resource_id,
       (next_avg - "avg") * 100.0 / "avg" as variance
from (select t.*,
             lead("avg") over (partition by resouce_id order by <ordering_column) as next_avg
      from t
     ) t
where next_avg is not null;

This works specifically for your data, with two rows per resource id.

If I follow your query correctly, then you want something like this:

SELECT EXTRACT(month FROM Date(usage_start_time)) as month,
       cdh.resource_id,
       SUM(cost) as month_cost,
       AVG(SUM(cost)) OVER (PARTITION BY cdh.resource_id ORDER BY MIN(usage_start_time) ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) as prev_avg,
       (100.0 - (AVG(SUM(cost)) OVER (PARTITION BY cdh.resource_id ORDER BY MIN(usage_start_time) ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) * 100.0
       ) / SUM(cost) as variance
FROM PUBLIC.billing_bigquery_data bbd JOIN
     PUBLIC.conf_data_history cdh
     ON bbd.labels_value = cdh.resource_id
WHERE cdh.metric_name IN ( 'VOLUME_TYPE' ) AND
      labels_key IN ( 'disk_id', 'instance_id' )
GROUP BY month, resource_id;

Your query is much more complicated than necessary.

Upvotes: 2

Serkan Arslan
Serkan Arslan

Reputation: 13393

you can try this.

SELECT resource_id, 
  100 * ( (MAX(avg)-MIN(avg)) / MAX(avg) ) 
FROM SampleTable
GROUP BY resource_id

if the comparison is between the current month and the previous.

SELECT resource_id, 
  100 * ( MAX(CASE WHEN RN = 1 THEN avg ELSE 0 END) - MAX(CASE WHEN RN = 2 THEN avg ELSE 0 END) )
   / MAX(CASE WHEN RN = 1 THEN avg ELSE 0 END)
FROM (
  SELECT resource_id, avg, ROW_NUMBER() OVER(PARTITION BY resource_id ORDER BY month desc) RN FROM SampleTable
) AS T
group by resource_id

Upvotes: 3

Related Questions