Reputation: 810
I've saved this query to as a View:
SELECT nth(1,CodAlm) as FirstCode,
nth(1,DesAlm) as FirstDescription,
last(CodAlm) as LastCode,
Last(DesAlm) as LastDescription,
max(DATE(DataTic)) as LastVisit,
min(DATE(DataTic)) as FirstVisit,
DATEDIFF(CURRENT_TIMESTAMP(),TIMESTAMP(max(DATE(DataTic)))) as Diffdays,
count(distinct DATE(DataTic)) as countVisits,
count(distinct CodAlm) as NumberCodes,
sum(subtot) as Totalimport,
TarCli,
Last(nomcli) as Name,
Last(cogcli) as LastName,
Last(emailcli) as email,
Last(sexcli) as gender
FROM (SELECT CodAlm, DesAlm, DataTic,SubTot, TarCli, NomCli,CogCli,EmailCli,SexCli FROM [bime.Sales] where Year(DataTic)>2012 AND IsFirstLine="1" ORDER by TarCli, DataTic)
group each by tarcli
But, When I run any query over this view, bigquery returns me Resources exceeded during query execution. I think that ORDER BY is the cause of my problem, but I need this to show my results correctly. How I could rewrite this query correctly? bime.Sales table have 18 Milion rows.
Upvotes: 0
Views: 561
Reputation: 7046
See this question for more info:
What causes "resources exceeded" in BigQuery?
The error is likely caused by the GROUP EACH BY clause, and the most likely reason is that you have a skewed distribution of keys (i.e., one key with a disproportionate number of records). Can you look at your data distribution, and perhaps filter out any skewed keys?
Also note that the ORDER BY is not guaranteed to be preserved by GROUP EACH BY, so you need to apply the ordering after the GROUP EACH BY. You may find it useful to use analytic functions like FIRST_VALUE, NTH_VALUE, and LAST_VALUE with OVER(PARTITION BY tarcli ORDER BY DataTic) instead of GROUP EACH BY if you want to get reliable ordering.
Upvotes: 2
Reputation: 179
Some things you should consider and try (if you haven't done that so far):
1) Do you really need the "group each by"? Have you tried with just "group by"?
2) Have you tried to use a table instead of a view? You could try to "materialize" the view to check if the resource consumption decreases.
3) Can you shard the data? Perhaps putting each year or month in a different table (using DataTic). That would decrease the size of each table and, therefore, the resource usage.
Cheers!
Upvotes: 1