Fastest way to select distinct values on multiple columns individually

Question

Lets say I have the following data

+-------+---------+
| col1  | col2    |
+-------+---------+
|  1    | a       |
|  2    | a       |
|  3    | a       |
|  2    | a       |
|  5    | d       |
|  5    | b       |
+-------+---------+

I would like to write a query that returns two columns, column and their unique values. Thus;

+---------+---------+
| column  | value   |
+---------+---------+
|  col1    | 1      |
|  col1    | 2      |
|  col1    | 3      |
|  col1    | 5      |
|  col2    | a      |
|  col2    | b      |
|  col2    | d      |
+----------+--------+

I could achieve this with the following query:

    SELECT 'col1' AS column, DISTINCT(col1) AS value FROM db
UNION ALL
    SELECT 'col2' AS column, DISTINCT(col2) AS value FROM db

It works fine, but in my real DB I have more than 300 million rows and 300+ columns. I do believe all the UNION ALL will slow the process a lot and I'm wondering if there is any other way? The aggregated results will be fetched in R/Python so if a bit of extra manipulation is needed on a much smaller table, that is fine.

Fastest way to select distinct values on multiple columns individually

Answers (1)

Related Questions