Ayberk Yavuz
Ayberk Yavuz

Reputation: 431

Is there a way to optimize my bigquery sql in order to run faster?

There are some columns that i want to fecth under the conditions.

Columns are : point_id, weight(row currentness index 0 to 7, 0 is the most current row), localdate (YYYYmmdd), tmin(minimum temperature), tmax (maximum temperature) and precip_amount (precipitation amount mm).

The conditions are :
(localdate >= 20151201 AND localdate <= 20160104) AND (tmin < 0 OR tmax < 0) AND precip_amount > 40

My aim is to fetch the most current rows based on weight.

I wrote a sql that works fine (2.7s elapsed, 64.4 MB processed).

But, is there a way to optimize my sql in order to run faster ?

My sql :

select a.point_id as point_id , a.weight as min_weight,a.localdate as local_date, a.tmin as temp_min, a.tmax as temp_max, a.precip_amount as precipitation
from table1 a
join (select point_id, min(weight) as min_weight
            from 
            (select point_id, localdate, tmin, tmax, precip_amount,weight
            from table1
            where (localdate >= 20151201 and localdate <= 20160104)  and (tmin < 0 or tmax < 0) and precip_amount > 40
            order by weight)
            group by point_id) b
on a.point_id = b.point_id and a.weight = b.min_weight
where (a.localdate >= 20151201 and a.localdate <= 20160104)  and (a.tmin < 0 OR a.tmax < 0) and a.precip_amount > 40
order by a.weight, a.localdate

Upvotes: 3

Views: 245

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1271121

Try using a window function:

select point_id, a.weight as min_weight, local_date, temp_min, temp_max,
       precip_amount
from (select a.point_id as point_id, a.weight,
             a.localdate as local_date, a.tmin as temp_min, a.tmax as temp_max,
             a.precip_amount as precipitation,
             min(weight) over (partition by point_id) as min_weight
     from table1 a
     where (a.localdate >= 20151201 and a.localdate <= 20160104) and
           (a.tmin < 0 OR a.tmax < 0) and a.precip_amount > 40
    ) a
where a.weight = w.min_weight
order by a.weight, a.localdate;

Upvotes: 0

paparazzo
paparazzo

Reputation: 45106

select * 
from 
( select a.point_id as point_id , a.weight as min_weight,a.localdate as local_date, a.tmin as temp_min, a.tmax as temp_max, a.precip_amount as precipitation  
       , row_number() over (partition by point_id order by weight asc) as rn
    from table1 a
   where (a.localdate >= 20151201 and a.localdate <= 20160104)   
     and (a.tmin < 0 OR a.tmax < 0) 
     and a.precip_amount > 40
) tt
where tt.rn = 1
order by tt.weight, tt.localdate

Upvotes: 1

fabulaspb
fabulaspb

Reputation: 1263

Try this query. I have moved calculation on minimum weight to subquery.

SELECT a.point_id as point_id, 
        (SELECT MIN(A2.weight) FROM Table A2
         WHERE A2.point_id = A.point_id
               AND A2.localdate >= 20151201 
               AND A2.localdate <= 20160104  
               AND (A2.tmin < 0 or A2.tmax < 0) 
               AND A2.precip_amount > 40) AS min_weight,
        A.localdate as local_date, 
        A.tmin as temp_min, 
        A.tmax as temp_max, 
        A.precip_amount as precipitation
FROM Table AS A
WHERE  A.localdate >= 20151201
       AND A.localdate <= 20160104 
       AND (A.tmin < 0 OR A.tmax < 0) 
       AND A.precip_amount > 40
ORDER BY A.weight, A.localdate

You can also try use windowed function. For example

SELECT T.* FROM
       (SELECT A.point_id,
               MIN(A.weight) OVER(PARTITION BY A.point_id) AS min_weight,
               A.localdate AS local_date,
               A.tmin AS temp_min,
               A.tmax AS temp_max,
               A.precip_amount as precipitation
        FROM Table A
        WHERE A.localdate >= 20151201
                   AND A.localdate <= 20160104 
                   AND (A.tmin < 0 OR A.tmax < 0) 
                   AND A.precip_amount > 40) AS T
    ORDER BY T.min_weight, 
             T.localdate

Upvotes: 1

Related Questions