Nik
Nik

Reputation: 5735

Find min max over all columns without listing down each column name in SQL

I have a SQL table (actually a BigQuery table) that has a huge number of columns (over a thousand). I want to quickly find the min and max value of each column. Is there a way to do that?

It is impossible for me to list all the columns. Looking for ways to do something like

SELECT MAX(*) FROM mytable;

and then running

SELECT MIN(*) FROM mytable;

I have been unable to Google a way of doing that. Not sure that's even possible.

For example, if my table has the following schema:

col1  col2  col3  .... col1000

the (say, max) query should return

Row  col1  col2 col3  ... col1000
1    3     18   0.6   ... 45

and the min query should return (say)

Row  col1  col2 col3  ... col1000
1    -5     4   0.1   ... -5

The numbers are just for illustration. The column names could be different strings and not easily scriptable.

Upvotes: 2

Views: 4326

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172994

See below example for BigQuery Standard SQL - it works for any number of columns and does not require explicit calling/use of columns names

#standardSQL
WITH `project.dataset.mytable` AS (
  SELECT 1 AS col1, 2 AS col2, 3 AS col3, 4 AS col4 UNION ALL
  SELECT 7,6,5,4 UNION ALL
  SELECT -1, 11, 5, 8
)
SELECT 
  MIN(CAST(value AS INT64)) AS min_value, 
  MAX(CAST(value AS INT64)) AS max_value
FROM `project.dataset.mytable` t, 
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'":(.*?)(?:,"|})')) value  

with result

Row min_value   max_value    
1   -1          11    

Note: if your columns are of STRING data type - you should remove CAST ... AS INT64
Or if they are of FLOAT64 - replace INT64 with FLOAT64 in the CAST function

Update

Below is option to get MIN/Max for each column and present result as array of respective values as list of respective values in the order of the columns

#standardSQL
WITH `project.dataset.mytable` AS (
  SELECT 1 AS col1, 2 AS col2, 3 AS col3, 14 AS col4 UNION ALL
  SELECT 7,6,5,4 UNION ALL
  SELECT -1, 11, 5, 8
), temp AS (
  SELECT pos, MIN(CAST(value AS INT64)) min_value, MAX(CAST(value AS INT64)) max_value
  FROM `project.dataset.mytable` t, 
  UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'":(.*?)(?:,"|})')) value WITH OFFSET pos
  GROUP BY  pos
)
SELECT 'min_values' stats, TO_JSON_STRING(ARRAY_AGG(min_value ORDER BY pos)) vals FROM temp UNION ALL
SELECT 'max_values', TO_JSON_STRING(ARRAY_AGG(max_value ORDER BY pos))  FROM temp 

with result as

Row stats       vals     
1   min_values  [-1,2,3,4]   
2   max_values  [7,11,5,14]    

Hope this is something you can still apply to whatever your final goal

Upvotes: 4

Related Questions