Reputation: 25
I have a set of data that is structured as such:
[user_id, title, country, platform, language]
[100, 'Title A', 'US', 'Windows', 'English']
[100, 'Title A', 'US', 'Android', 'English']
[200, 'Title C', 'FR', 'Windows', 'French']
[300, 'Title B', 'US', 'Windows', 'English']
And so on...
I need to transform this data so that I count the unique number of users in each category.
If I was to write the query:
SELECT
title
, country
, platform
, language
count(distinct user_id)
FROM table
GROUP BY 1
, 2
, 3
, 4
The resulting table would look like:
[title, country, platform, language, unique_count]
['Title A', 'US', 'Windows', 'English', 10,000]
['Title A', 'US', 'Android', 'English', 7,000]
['Title C', 'FR', 'Windows', 'France', 4,000]
['Title B', 'US', 'Windows', 'English', 8,000]
And so on...
If I was to isolate individual dimensions there will be overlap as a user may be in multiple categories.
How could I structure the data in a way that the rows are inclusive and may be tabulated in a dashboard for example?
If there were only two categories this seems like a simpler problem as the data could be formatted as a cube:
| Windows | Android |
--------+---------+---------+----
Title A | 10,000 | 7,000 | 17,000
--------+---------+---------+----
Title B | 8,000 | 11,000 | 19,000
--------+---------+---------+----
| 19,000 | 18,000 |
Does something like an n-dimensional structure exists that could include all dimensions?
The other issue is that the data must be aggregated and cannot be simply pivoted as it is too large to fit in memory.
Upvotes: 0
Views: 284
Reputation: 1270091
If you want all combinations, then use with cube
:
SELECT title, country, platform, language,
count(unique user_id)
FROM table
GROUP BY title, country, platform, language with cube;
More commonly, I prefer GROUPING SETS
. For instance, to get all pairs:
SELECT title, country, platform, language,
count(unique user_id)
FROM table
GROUP BY ( (title, country),
(title, platform),
(title, language),
(country, platform),
(country, language),
(platform, language)
);
Upvotes: 1