Reputation: 3253
I would like to combine the entries from the two tables below to match obj
to class
via attr
.
I have many objects with different names, dates, and other information in a single table (not shown). Each of these objects can fall into classes based on their attributes. The mapping of objects obj
to each of their attributes attr
is in Table A. The mapping of classes class
to attributes attr
is in Table B.
I would like to map each object obj
to its matching class class
to give the desired output shown in Table C.
In some ways, this seems like a join operation on groups. Is there a way to do this in standard sql and/or with Google BigQuery?
An important point -- both attr
of a class
and of an obj
are not ordered.
Table A:
-------------------
| obj | attr |
-------------------
| obj1 | I |
| obj1 | II |
| obj2 | I |
| obj2 | II |
| obj3 | I |
| obj3 | II |
| obj3 | II |
| obj4 | III |
| obj4 | I |
-------------------
Table B:
-------------------
| attr | class |
-------------------
| I | A |
| II | A |
| I | B |
| III | B |
-------------------
Desired output (Table C):
-----------------------
| obj | class |
-----------------------
| obj1 | A |
| obj2 | A |
| obj4 | B |
-----------------------
(Note that obj3
doesn't fall into class
A
because it has an additional II
attr
.)
Upvotes: 0
Views: 32
Reputation: 172993
Below is for BigQuery Standard SQL
#standardSQL
SELECT obj, class FROM (
SELECT obj, STRING_AGG(attr ORDER BY attr) attr
FROM `project.dataset.TableA`
GROUP BY obj
) JOIN (
SELECT class, STRING_AGG(attr ORDER BY attr) attr
FROM `project.dataset.TableB`
GROUP BY class
) USING(attr)
You can test / play with it using dummy data from your question as below
#standardSQL
WITH `project.dataset.TableA` AS (
SELECT 'obj1' obj, 'I' attr UNION ALL
SELECT 'obj1', 'II' UNION ALL
SELECT 'obj2', 'I' UNION ALL
SELECT 'obj2', 'II' UNION ALL
SELECT 'obj3', 'I' UNION ALL
SELECT 'obj3', 'II' UNION ALL
SELECT 'obj3', 'II' UNION ALL
SELECT 'obj4', 'III' UNION ALL
SELECT 'obj4', 'I'
), `project.dataset.TableB` AS (
SELECT 'I' attr, 'A' class UNION ALL
SELECT 'II', 'A' UNION ALL
SELECT 'I', 'B' UNION ALL
SELECT 'III', 'B'
)
SELECT obj, class FROM (
SELECT obj, STRING_AGG(attr ORDER BY attr) attr
FROM `project.dataset.TableA`
GROUP BY obj
) JOIN (
SELECT class, STRING_AGG(attr ORDER BY attr) attr
FROM `project.dataset.TableB`
GROUP BY class
) USING(attr)
Output is:
obj class
---- -----
obj1 A
obj2 A
obj4 B
Upvotes: 2