SQL Server Group By Sets without duplicates

Question

I need to create a "rolled up" slash "grouped" view of Customer Data for our client.

A simplified explanation would be that data need to be grouped by geographical (ex. Country, Province, City etc.) data and rolled up by the amount of people that have an email address and/or a phone number.

The problem is that a person can be in more than one Cities (lowest level) and then are counted multiple times in any higher levels (ex Province).

Here is an example using GROUPING SETS:

DECLARE @Customer TABLE
(
    CustomerId VARCHAR(50),
    Phone BIT,
    Email BIT,
    ProvinceId VARCHAR(50),
    CityId VARCHAR(50)
)

INSERT INTO @Customer(CustomerId, Phone, Email, ProvinceId, CityId) VALUES ('Customer A', 1, NULL, 'Province A', 'City A')
INSERT INTO @Customer(CustomerId, Phone, Email, ProvinceId, CityId) VALUES ('Customer A', 1, NULL, 'Province A', 'City B')
INSERT INTO @Customer(CustomerId, Phone, Email, ProvinceId, CityId) VALUES ('Customer B', 1, 1, 'Province A', 'City B')

SELECT COUNT(Phone) PersonWithPhoneCount, COUNT(Email) PersonWithEmailCount, ProvinceId, CityId FROM @Customer
GROUP BY GROUPING SETS ((ProvinceId), (ProvinceId, CityId))

and this is the result:

----------------------------------------------------------------------------
| PersonWithPhoneCount  |  PersonWithEmailCount  |  ProvinceId  |  CityId  |
----------------------------------------------------------------------------
|                    1  |                     0  |  Province A  |  City A  |
|                    2  |                     1  |  Province A  |  City B  |
|                    3  |                     1  |  Province A  |   NULL   |
----------------------------------------------------------------------------

The result is correct for the lowest level (City) but for the Province level "Customer A" is counted twice. I understand why, but is there a way to not count "Customer A" twice?

Do I have to group all the different levels individually or is there a better way?

Performance is also a major issue as the live data adds up to 100+ million rows.

Thanks in advance.

SQL Server Group By Sets without duplicates

Answers (1)

Related Questions