Reputation: 181
I am currently having a problem with a Standard SQL query. I have a list of emails where every email can have multiple functions. See the example below on how the table looks like.
Email Function
[email protected] engineer
[email protected] specialist
[email protected] analyst
[email protected] tester
[email protected] manager
[email protected] intern
What I want is a query that returns every email once with the first function it finds. So the above table should return the following:
Email Function
[email protected] engineer
[email protected] analyst
[email protected] intern
How do I do this?
What I have right now is a simplified version of the query.
SELECT Email, Function
FROM database
GROUP BY Email, Function
The issue is here is that I have to put both Email and Function in the GROUP BY. If I only put Email in the Group By the query cannot run even though I only want the query to GROUP BY Email.
Thanks!
Upvotes: 16
Views: 29928
Reputation: 11
SELECT Email, Function
FROM database
qualify row_number() over(partition by Email order by Function) == 1
Upvotes: 1
Reputation: 947
I suggest using ARRAY_AGG and OFFSET. It concatenates the different Function values while OFFSET(0) picks the first element of the array.
SELECT
Email,
ARRAY_AGG(Function)[OFFSET(0)] as Function
FROM database
GROUP BY Email
Upvotes: 0
Reputation: 11
Here's my hacky way to get the first value when you need to use Group By
SELECT
Email,
SPLIT(STRING_AGG(Function, ',' ORDER BY Function), ',')[SAFE_ORDINAL(1)] AS Function
FROM database
GROUP BY Email
First, combine value from field Function
into 1 separated by ,
and sorted by field Function
using STRING_AGG()
function.
Then get the first value using SPLIT()[SAFE_ORDINAL(1)]
.
You can change ORDER BY Function
with other fields (e.g. event timestamp).
Upvotes: 1
Reputation: 103
Have you considered the BigQuery Navigation Functions? For example, FIRST_VALUE. You can view this in the documents here and perhaps do something like the below:
SELECT Email,
FIRST_VALUE(Function)
OVER (PARTITION BY Email ORDER BY x ) AS First_Function
FROM database
However, the other comments hint to the problem of ordering.
So you would need some sort of ordering variable (x), like a time, or even an index.
Upvotes: 4
Reputation: 1269793
There is no such thing as the "first" function, because SQL tables represent unordered sets -- especially in a designed-from-the-ground-up parallel database such as BigQuery.
You need to use some sort of aggregation function.
A simple one is any_value()
:
SELECT Email, ANY_VALUE(Function)
FROM database
GROUP BY Email;
If you have another column that specifies the ordering, then it can be used to fetch the function
associated with the minimum value of that column.
Upvotes: 11
Reputation: 31993
use row_number()
analytic function
with cte as
(select * ,row_number() over(partition by email order by Function) rn
from table
) select * from cte where rn=1
Upvotes: 2
Reputation: 37473
Use row_number()
window function
select * from
(
select *, row_number() over(partition by email order by funcion) as rn
from tablename
)a where rn=1
Upvotes: 13
Reputation: 50163
You need ordering column that specify your column ordering then you can do :
select t.*
from table t
where t.pk = (select min(t1.pk) from table t1 where t1.email = t.email);
Here assuming pk
is identity column.
Upvotes: 0