Reputation: 97
I am using BigQuery at work and I am doing some data validation. As part of this I am trying to find all columns in a given table that consist entirely of null
values. I know that I can query each column individually using something like
SELECT count(id), <column_name> FROM <dataset>.<table>
WHERE <column_name> IS NOT NULL
GROUP EACH BY 2 HAVING count(id) = 0
I would rather not have to do this for every column in the table as there are a large number of them.
Upvotes: 1
Views: 3474
Reputation: 11
SELECT
column_name, COUNT(1) AS nulls_count
FROM
`projectID.Dataset.Table`,
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(`projectID.Dataset.Table`), r'"(\w+)":null')) column_name
GROUP BY
column_name
ORDER BY
nulls_count DESC
The above bigquery gives the number of records having NULL Value and group the columns. Also note that query will not give non null columns.
Upvotes: 1
Reputation: 1
To add to the above answer,
the above answer works only for integer columns, if there are columns of other type you can do the following
SELECT SUM(IF(column1 IS NULL, 0, 1)), SUM(IF(column2 IS NULL, 0, 1)) FROM table
In this case irrespective of the data type, all null values are considered as 0 and other values are considered as 1. If the result is 0, then the column contains all values as null.
Upvotes: 0
Reputation: 3251
This can be done without running a query per column with something like the following. Take a SUM
of the non-null values in a SELECT
, and look for the columns that contain zero non-nulls.
SELECT
SUM(column_1 IS NOT NULL),
SUM(column_2 IS NOT NULL),
SUM(column_3 IS NOT NULL)
FROM
(SELECT NULL AS column_1, NULL AS column_2, 17 AS column_3),
(SELECT 18 AS column_1, NULL AS column_2, 19 AS column_3)
In this case, the results are:
| column_1 | column_2 | column_3 |
|----------|----------|----------|
| 1 | 0 | 2 |
which tells us that column_2 contains only nulls.
Upvotes: 3