Reputation: 5735
I have a SQL table (actually a BigQuery
table) that has a huge number of columns (over a thousand). I want to quickly find the min and max value of each column. Is there a way to do that?
It is impossible for me to list all the columns. Looking for ways to do something like
SELECT MAX(*) FROM mytable;
and then running
SELECT MIN(*) FROM mytable;
I have been unable to Google a way of doing that. Not sure that's even possible.
For example, if my table has the following schema:
col1 col2 col3 .... col1000
the (say, max) query should return
Row col1 col2 col3 ... col1000
1 3 18 0.6 ... 45
and the min query should return (say)
Row col1 col2 col3 ... col1000
1 -5 4 0.1 ... -5
The numbers are just for illustration. The column names could be different strings and not easily scriptable.
Upvotes: 2
Views: 4326
Reputation: 172994
See below example for BigQuery Standard SQL - it works for any number of columns and does not require explicit calling/use of columns names
#standardSQL
WITH `project.dataset.mytable` AS (
SELECT 1 AS col1, 2 AS col2, 3 AS col3, 4 AS col4 UNION ALL
SELECT 7,6,5,4 UNION ALL
SELECT -1, 11, 5, 8
)
SELECT
MIN(CAST(value AS INT64)) AS min_value,
MAX(CAST(value AS INT64)) AS max_value
FROM `project.dataset.mytable` t,
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'":(.*?)(?:,"|})')) value
with result
Row min_value max_value
1 -1 11
Note: if your columns are of STRING data type - you should remove CAST ... AS INT64
Or if they are of FLOAT64 - replace INT64 with FLOAT64 in the CAST function
Update
Below is option to get MIN/Max for each column and present result as array of respective values as list of respective values in the order of the columns
#standardSQL
WITH `project.dataset.mytable` AS (
SELECT 1 AS col1, 2 AS col2, 3 AS col3, 14 AS col4 UNION ALL
SELECT 7,6,5,4 UNION ALL
SELECT -1, 11, 5, 8
), temp AS (
SELECT pos, MIN(CAST(value AS INT64)) min_value, MAX(CAST(value AS INT64)) max_value
FROM `project.dataset.mytable` t,
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'":(.*?)(?:,"|})')) value WITH OFFSET pos
GROUP BY pos
)
SELECT 'min_values' stats, TO_JSON_STRING(ARRAY_AGG(min_value ORDER BY pos)) vals FROM temp UNION ALL
SELECT 'max_values', TO_JSON_STRING(ARRAY_AGG(max_value ORDER BY pos)) FROM temp
with result as
Row stats vals
1 min_values [-1,2,3,4]
2 max_values [7,11,5,14]
Hope this is something you can still apply to whatever your final goal
Upvotes: 4