blarghmatey
blarghmatey

Reputation: 97

Is it possible to list all columns without data in BigQuery

I am using BigQuery at work and I am doing some data validation. As part of this I am trying to find all columns in a given table that consist entirely of null values. I know that I can query each column individually using something like

SELECT count(id), <column_name> FROM <dataset>.<table>
WHERE <column_name> IS NOT NULL
GROUP EACH BY 2 HAVING count(id) = 0

I would rather not have to do this for every column in the table as there are a large number of them.

Upvotes: 1

Views: 3474

Answers (3)

Nature Labs
Nature Labs

Reputation: 11

SELECT
    column_name, COUNT(1) AS nulls_count
FROM
    `projectID.Dataset.Table`,
    UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(`projectID.Dataset.Table`), r'"(\w+)":null')) column_name
GROUP BY
    column_name
ORDER BY
    nulls_count DESC

The above bigquery gives the number of records having NULL Value and group the columns. Also note that query will not give non null columns.

Upvotes: 1

Jey Raj
Jey Raj

Reputation: 1

To add to the above answer,

the above answer works only for integer columns, if there are columns of other type you can do the following

SELECT SUM(IF(column1 IS NULL, 0, 1)), SUM(IF(column2 IS NULL, 0, 1)) FROM table

In this case irrespective of the data type, all null values are considered as 0 and other values are considered as 1. If the result is 0, then the column contains all values as null.

Upvotes: 0

Danny Kitt
Danny Kitt

Reputation: 3251

This can be done without running a query per column with something like the following. Take a SUM of the non-null values in a SELECT, and look for the columns that contain zero non-nulls.

SELECT
  SUM(column_1 IS NOT NULL),
  SUM(column_2 IS NOT NULL),
  SUM(column_3 IS NOT NULL)
FROM
  (SELECT NULL AS column_1, NULL AS column_2, 17 AS column_3),
  (SELECT 18 AS column_1, NULL AS column_2, 19 AS column_3)

In this case, the results are:

| column_1 | column_2 | column_3 |
|----------|----------|----------|
|        1 |        0 |        2 |

which tells us that column_2 contains only nulls.

Upvotes: 3

Related Questions