saladi
saladi

Reputation: 3253

join on multiple/group of entries

I would like to combine the entries from the two tables below to match obj to class via attr.

I have many objects with different names, dates, and other information in a single table (not shown). Each of these objects can fall into classes based on their attributes. The mapping of objects obj to each of their attributes attr is in Table A. The mapping of classes class to attributes attr is in Table B.

I would like to map each object obj to its matching class class to give the desired output shown in Table C.

In some ways, this seems like a join operation on groups. Is there a way to do this in standard sql and/or with Google BigQuery?

An important point -- both attr of a class and of an obj are not ordered.

Table A:

-------------------
|  obj  |  attr   | 
-------------------
|  obj1 |     I   |
|  obj1 |     II  |  
|  obj2 |     I   |
|  obj2 |     II  |
|  obj3 |     I   |
|  obj3 |     II  |  
|  obj3 |     II  |
|  obj4 |     III |
|  obj4 |     I   |
-------------------

Table B:

-------------------
| attr  |  class  | 
-------------------
|  I    |    A    |
|  II   |    A    |  
|  I    |    B    |
|  III  |    B    |
-------------------

Desired output (Table C):

-----------------------
|  obj     |  class   | 
-----------------------
|  obj1    |    A     |
|  obj2    |    A     |
|  obj4    |    B     |
-----------------------

(Note that obj3 doesn't fall into class A because it has an additional II attr.)

Upvotes: 0

Views: 32

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Below is for BigQuery Standard SQL

#standardSQL
SELECT obj, class FROM (
  SELECT obj, STRING_AGG(attr ORDER BY attr) attr
  FROM `project.dataset.TableA`
  GROUP BY obj
) JOIN (
  SELECT class, STRING_AGG(attr ORDER BY attr) attr
  FROM `project.dataset.TableB`
  GROUP BY class
) USING(attr)  

You can test / play with it using dummy data from your question as below

#standardSQL
WITH `project.dataset.TableA` AS (
  SELECT 'obj1' obj, 'I' attr UNION ALL
  SELECT 'obj1', 'II' UNION ALL
  SELECT 'obj2', 'I' UNION ALL
  SELECT 'obj2', 'II' UNION ALL
  SELECT 'obj3', 'I' UNION ALL
  SELECT 'obj3', 'II' UNION ALL
  SELECT 'obj3', 'II' UNION ALL
  SELECT 'obj4', 'III' UNION ALL
  SELECT 'obj4', 'I' 
), `project.dataset.TableB` AS (
  SELECT 'I' attr, 'A' class UNION ALL
  SELECT 'II', 'A' UNION ALL
  SELECT 'I', 'B' UNION ALL
  SELECT 'III', 'B' 
)
SELECT obj, class FROM (
  SELECT obj, STRING_AGG(attr ORDER BY attr) attr
  FROM `project.dataset.TableA`
  GROUP BY obj
) JOIN (
  SELECT class, STRING_AGG(attr ORDER BY attr) attr
  FROM `project.dataset.TableB`
  GROUP BY class
) USING(attr)   

Output is:

obj     class    
----    -----
obj1    A    
obj2    A    
obj4    B    

Upvotes: 2

Related Questions