SUM of SUM in BigQuery without duplicates

Question

I have a table with a one-to-many relationship of a business' name to a ZIP code, due to multiple industry codes matching that business for a given ZIP code. A separate table contains households by ZIP codes. To sum the households by business name for a range of ZIP codes without duplicates, I query:

SELECT DBAName, SUM(Households) OVER (PARTITION BY ZIPCode) AS SumHouseholds  
FROM Business  
JOIN Location  
ON Location.ZIPCode = Business.ZIPCode  
WHERE ZIPCode like '1001%' and DBAName = 'ABC Plumbing'  
GROUP BY DBAName, ZIPCode, Households  
ORDER BY SumHouseholds Desc

And get:

Row DBAName SumHouseholds  
1   ABC Plumbing    30  
2   ABC Plumbing    25  
3   ABC Plumbing    5

Each row represents a distinct ZIP code within the range, and its deduplicated sum of households. However, I'd like to sum the SumHouseholds from those rows = 60. When I try to nest SELECT statements for this:

SELECT DBAName, SUM(SumHouseholds) AS TotalHouseholds FROM (  
SELECT DBAName, SUM(Households) OVER (PARTITION BY ZIPCode) AS SumHouseholds  
FROM Business  
JOIN Location  
ON Location.ZIPCode = Business.ZIPCode  
WHERE ZIPCode like '1001%' and DBAName = 'ABC Plumbing')  
GROUP BY DBAName  
ORDER BY TotalHouseholds

It returns the the sum of Households for all matching records with duplicates. Since 3 records match the query for each ZIP code, the result is 30x3+25x3+5x3=180.

How should I change the query to return SUM(SumHouseholds)=60?

Mikhail Berlyant · Accepted Answer

Try below

SELECT DBAName, SUM(SumHouseholds) AS TotalHouseholds 
FROM (
  SELECT DBAName, ZIPCode, SUM(Households) AS SumHouseholds
  FROM Business
  JOIN Location
  ON Location.ZIPCode = Business.ZIPCode
  //WHERE ZIPCode LIKE '1001%' AND DBAName = 'ABC Plumbing'
  GROUP BY DBAName, ZIPCode
)
GROUP BY DBAName
ORDER BY TotalHouseholds

SUM of SUM in BigQuery without duplicates

Answers (1)

Related Questions