Peter Nagel
Peter Nagel

Reputation: 181

BigQuery: Return First Value from Different Groups in a Group By

I am currently having a problem with a Standard SQL query. I have a list of emails where every email can have multiple functions. See the example below on how the table looks like.

Email                         Function
[email protected]               engineer
[email protected]               specialist
[email protected]                analyst
[email protected]                tester
[email protected]                manager
[email protected]             intern

What I want is a query that returns every email once with the first function it finds. So the above table should return the following:

Email                         Function
[email protected]               engineer
[email protected]                analyst
[email protected]             intern

How do I do this?

What I have right now is a simplified version of the query.

SELECT Email, Function
FROM database
GROUP BY Email, Function

The issue is here is that I have to put both Email and Function in the GROUP BY. If I only put Email in the Group By the query cannot run even though I only want the query to GROUP BY Email.

Thanks!

Upvotes: 16

Views: 29928

Answers (8)

Nikita Koss
Nikita Koss

Reputation: 11

SELECT Email, Function
FROM database
qualify row_number() over(partition by Email order by Function) == 1

Upvotes: 1

Antiez
Antiez

Reputation: 947

I suggest using ARRAY_AGG and OFFSET. It concatenates the different Function values while OFFSET(0) picks the first element of the array.

SELECT 
   Email, 
   ARRAY_AGG(Function)[OFFSET(0)] as Function
FROM database
GROUP BY Email

Upvotes: 0

Ferdyanto Chandra
Ferdyanto Chandra

Reputation: 11

Here's my hacky way to get the first value when you need to use Group By

SELECT
  Email,
  SPLIT(STRING_AGG(Function, ',' ORDER BY Function), ',')[SAFE_ORDINAL(1)] AS Function
FROM database
GROUP BY Email

First, combine value from field Function into 1 separated by , and sorted by field Function using STRING_AGG() function. Then get the first value using SPLIT()[SAFE_ORDINAL(1)].

You can change ORDER BY Function with other fields (e.g. event timestamp).

Upvotes: 1

K_B1
K_B1

Reputation: 103

Have you considered the BigQuery Navigation Functions? For example, FIRST_VALUE. You can view this in the documents here and perhaps do something like the below:

SELECT Email, 
    FIRST_VALUE(Function)
        OVER (PARTITION BY Email ORDER BY x ) AS First_Function
FROM database

However, the other comments hint to the problem of ordering.

So you would need some sort of ordering variable (x), like a time, or even an index.

Upvotes: 4

Gordon Linoff
Gordon Linoff

Reputation: 1269793

There is no such thing as the "first" function, because SQL tables represent unordered sets -- especially in a designed-from-the-ground-up parallel database such as BigQuery.

You need to use some sort of aggregation function.

A simple one is any_value():

SELECT Email, ANY_VALUE(Function)
FROM database
GROUP BY Email;

If you have another column that specifies the ordering, then it can be used to fetch the function associated with the minimum value of that column.

Upvotes: 11

Zaynul Abadin Tuhin
Zaynul Abadin Tuhin

Reputation: 31993

use row_number() analytic function

with cte as
(select * ,row_number() over(partition by email order by Function) rn
from table
) select * from cte where rn=1

Upvotes: 2

Fahmi
Fahmi

Reputation: 37473

Use row_number() window function

select * from
(
select *, row_number() over(partition by email order by funcion) as rn
from tablename
)a where rn=1

Upvotes: 13

Yogesh Sharma
Yogesh Sharma

Reputation: 50163

You need ordering column that specify your column ordering then you can do :

select t.*
from table t
where t.pk = (select min(t1.pk) from table t1 where t1.email = t.email);

Here assuming pk is identity column.

Upvotes: 0

Related Questions