Reputation: 165
I am new to SQL. I have a table that has billion of records with multiple columns, let's say a,b,c,d,e,f.
I would like to create a query for number of records for each column with certain condition and the next column is the percentage of that result from total number of records, then column b, c,d and e with the same condition. I would like to get the output of my query in a single row.
So, for column a, the query that I would like is something like this:
select count(a) from table 1
where a >0 and date > '2020-01-01'
the next column would the percentage of the above results relative to total number of records using query like this
select count(*) from table 1
where date >'2020-01-01'
Column 3 will be the result of this query
select count(b) from table 1
where a >0 and date > '2020-01-01'
column 4 will be the percentage of the number of records of column b with the avove condition relative to total number records
and the same thing for column c,d, and e.
The expected results would be something like this
number of records for column a with condition, percentage of a with condition relative to total records, number of records for column b with condition, percentage of b with condition relative to total records, .....................
How should the query be written correctly?
Upvotes: 0
Views: 51
Reputation: 222482
Use conditional aggregation:
select
avg(case
when a > 0 and date > '2020-01-01' then 1.0
when date > '2020-01-01' then 0
end) avg_a,
avg(case
when b > 0 and date > '2020-01-01' then 1.0
when date > '2020-01-01' then 0
end) avg_b
...
from mytable
This gives you the ability to compute metrics according to various conditions that are wrapped in the case
expression. The above computes the ratio of records whose column a
, b
, ... is greater than 0 and whose date is 2020 or more against the count of record that satisfy the date condition only. Other records are ignored.
Note that if the date conditions are the same for all case
s, you could just move it to the where
clause, which simplifies the logic:
select
avg(case when a > 0 then 1.0 else 0 end) avg_a,
avg(case when b > 0 then 1.0 else 0 end) avg_b
...
from mytable
where '2020-01-01'
Upvotes: 1
Reputation: 1269953
Just use conditional aggregation. Here is one method:
select avg(case when a > 0 then 1.0 else 0 end)
from table
where date >'2020-01-01'
Or, what is a wee bit shorter in BigQuery:
select countif(a > 0) / count(*)
from table
where date >'2020-01-01' ;
You can repeat either of these for the rest of the columns.
Upvotes: 0