Reputation: 4797
I have a query that joins two google tables and produces a table with 6 columns (a
, b
, c
, d
, e
, f
). Next, I move that table to a google bucket and then download that google bucket to a bunch of CSV's. Finally I insert those CVS's into a postgres
database table which has 2 primary keys, a
and b
.
The problem is that there is no primary key in the google table and some of the rows have just one value different, so I need to produce a query result that is has no duplicates in a
and b
. I have tried the following...
SELECT
DISTINCT A.a,
A.b,
A.c AS c,
A.d AS d,
B.c AS e,
B.d AS f
FROM
`my_proj.my_table_A` A
FULL OUTER JOIN
`my_proj.my_table_B` B
ON
A.a = B.a
AND A.b = B.b
...but it will still allow a
and b
to have duplicates if, for example, f
is different. I only need one of the duplicate rows and it does not matter which one.
How do I only remove duplicates based only on a
and b
?
Upvotes: 2
Views: 618
Reputation: 33705
Use GROUP BY
with ANY_VALUE
:
SELECT
A.a,
A.b,
ANY_VALUE(A.c) AS c,
ANY_VALUE(A.d) AS d,
ANY_VALUE(B.c) AS e,
ANY_VALUE(B.d) AS f
FROM
`my_proj.my_table_A` A
FULL OUTER JOIN
`my_proj.my_table_B` B
ON
A.a = B.a
AND A.b = B.b
GROUP BY
A.a,
A.b
Upvotes: 3