busheriff
busheriff

Reputation: 165

Combining query results for number of records in a single row

I am new to SQL. I have a table that has billion of records with multiple columns, let's say a,b,c,d,e,f.

I would like to create a query for number of records for each column with certain condition and the next column is the percentage of that result from total number of records, then column b, c,d and e with the same condition. I would like to get the output of my query in a single row.

So, for column a, the query that I would like is something like this:

select count(a) from table 1
where a >0 and date > '2020-01-01'

the next column would the percentage of the above results relative to total number of records using query like this

select count(*) from table 1
where date >'2020-01-01'

Column 3 will be the result of this query

select count(b) from table 1
where a >0 and date > '2020-01-01'

column 4 will be the percentage of the number of records of column b with the avove condition relative to total number records

and the same thing for column c,d, and e.

The expected results would be something like this

number of records for column a with condition, percentage of a with condition relative to total records, number of records for column b with condition, percentage of b with condition relative to total records, .....................

How should the query be written correctly?

Upvotes: 0

Views: 51

Answers (2)

GMB
GMB

Reputation: 222482

Use conditional aggregation:

select
    avg(case 
        when a > 0 and date > '2020-01-01' then 1.0
        when date > '2020-01-01' then 0
    end) avg_a,
    avg(case 
        when b > 0 and date > '2020-01-01' then 1.0
        when date > '2020-01-01' then 0
    end) avg_b
...
from mytable

This gives you the ability to compute metrics according to various conditions that are wrapped in the case expression. The above computes the ratio of records whose column a, b, ... is greater than 0 and whose date is 2020 or more against the count of record that satisfy the date condition only. Other records are ignored.

Note that if the date conditions are the same for all cases, you could just move it to the where clause, which simplifies the logic:

select
    avg(case when a > 0 then 1.0 else 0 end) avg_a,
    avg(case when b > 0 then 1.0 else 0 end) avg_b
...
from mytable
where '2020-01-01'

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269953

Just use conditional aggregation. Here is one method:

select avg(case when a > 0 then 1.0 else 0 end)
from table 
where date >'2020-01-01'

Or, what is a wee bit shorter in BigQuery:

select countif(a > 0) / count(*)
from table
where date >'2020-01-01' ;

You can repeat either of these for the rest of the columns.

Upvotes: 0

Related Questions