Reputation: 13088

Google BigQuery aggregation in one column resolves the other?

Given a table like this (named my_table)

| A  | B | C |
+----+---+---+
| -1 | a | x |
|  0 | b | x |
|  1 | c | x |
|  2 | d | y |
|  3 | e | z |

How do I do the equivalent of the following in Google BigQuery?

SELECT MIN(A) as A_min, B, C
FROM my_table
GROUP BY C
WHERE A > 0

The expected result is

| A_min  | B | C |
+--------+---+---+
|  1     | c | x |
|  2     | d | y |
|  3     | e | z |

The query above works fine in SQL (and just about any other SQL-ish engine I know of), but in BigQuery, it returns the error:

SELECT list expression references B which is neither grouped nor aggregated

I get why that error happens. My question is: how do I find the corresponding row in column B for MIN(A), WHERE A > 0?

Edit

Some of the proposed answers worked great for the trivial sample table that only includes one unique value for C. In a more realistic case, there will be many values for C, hence the need to aggregate on it.

Upvotes: 0

Answers (4)

Mikhail Berlyant

Reputation: 173046

Below is for BigQuery Standard SQL

#standardSQL
WITH `project.dataset.my_table` AS (
  SELECT -1 A, 'a' B, 'x' C UNION ALL
  SELECT 0, 'b', 'x' UNION ALL
  SELECT 1, 'c', 'x' UNION ALL
  SELECT 2, 'd', 'x' UNION ALL
  SELECT 3, 'e', 'x' 
)
SELECT ARRAY_AGG(STRUCT(A AS A_min, B) ORDER BY A LIMIT 1)[OFFSET(0)].*, C 
FROM `project.dataset.my_table`
WHERE A > 0
GROUP BY C

with result

Row A_min   B   C    
1   1       c   x

A little less verbose option would be

#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY A LIMIT 1)[OFFSET(0)] 
FROM `project.dataset.my_table` t
WHERE A > 0
GROUP BY C

with result

Row A   B   C    
1   1   c   x

Upvotes: 2

Tim Biegeleisen

Reputation: 521997

You could use ROW_NUMBER here:

SELECT A, B, C
FROM
(
    SELECT *, ROW_NUMBER() OVER (ORDER BY A) rn
    FROM my_table
    WHERE A > 0
) t
WHERE rn = 1;

This approach has the advantage of flexibility, because if you instead want all smallest A records with ties, all you need to do is replace ROW_NUMBER with RANK.

Upvotes: 2

Ed Bangga

Reputation: 13006

Include the columns on your group by clause.

SELECT MIN(A) as A_min, B, C
FROM my_table
WHERE A > 0
GROUP BY C, B

 SELECT t2.*, t1.C 
    FROM my_table t1 
    INNER JOIN 
       (SELECT MIN(A) as A_min, B
        FROM my_table
        WHERE A > 0
        GROUP BY B) t2 on t2.A_min = t1.A and t2.B = t1.B

Upvotes: 1

arcee123

Reputation: 243

Try this:

SELECT * 
FROM my_table
WHERE A = (select min(A) from my_table where A > 0)

you can also try:

Select B
from my_table
where A > 0 
order by A ASC
LIMIT 1;

depending on the need of this query.

Upvotes: 0

Google BigQuery aggregation in one column resolves the other?

Edit

Answers (4)

Related Questions