Lukasavicus
Lukasavicus

Reputation: 154

Join Tables with multiple levels of aggregation

I have a huge problem cause I'm trying to do this:

[Edit]: Based upon the commentary of @gordon-linoff I remodeled my question with some concrete example

I have to join two tables Sales and Quota. Table Sales:

+---------+---------+---------+---------+---------+---------+
| Country |   Year  | Store   | Manager |  Vendor | Customer|
+---------+---------+---------+---------+---------+---------+
|    MX   |   2018  |Cid. Mex.| Orlando |  Luiz   |   001   |
|    MX   |   2018  |Cid. Mex.| Orlando |  Fabio  |   002   |
|    MX   |   2018  |Cid. Mex.| Orlando |  Luiz   |   003   |
|    MX   |   2018  |Cid. Mex.| Orlando |  Juan   |   004   |
|    MX   |   2018  |Cid. Mex.| Orlando |  Juan   |   005   |
|    MX   |   2018  |Cid. Mex.| Javier  |Hernandez|   007   |
...

And Quota:

+---------+---------+---------+---------+---------+---------+------------+
| Country |   Year  | Store   | Manager |  Vendor | Customer|  Target    |
+---------+---------+---------+---------+---------+---------+------------+
|    MX   |  2018   |Cid. Mex.| Orlando |  Luiz   |  001    |     1,01   |
|    MX   |  2018   |Cid. Mex.| Orlando |  Fabio  |         |     2,00   |
|    MX   |  2018   |Cid. Mex.| Orlando |  Luiz   |         |     3,05   |
|    MX   |  2018   |Cid. Mex.| Orlando |  Juan   |  004    |     2,71   |
|    MX   |  2018   |Cid. Mex.| Orlando |         |         |    14,25   |
|    MX   |  2018   |Cid. Mex.|         |         |         |     16,1   |
...

And I want to something like this in the end: (Sales JOIN Quota):

|                     SALES FIELDS                          ||   QUOTA    |
+---------+---------+---------+---------+---------+---------++------------+
| Country |   Year  | Store   | Manager |  Vendor | Customer||  Target    |
+---------+---------+---------+---------+---------+---------++------------+
|    MX   |   2018  |Cid. Mex | Orlando |  Luiz   |  001    ||     1,01   | *1
|    MX   |   2018  |Cid. Mex | Orlando |  Fabio  |  002    ||     2,00   | *2
|    MX   |   2018  |Cid. Mex | Orlando |  Luiz   |  003    ||     3,05   | *3
|    MX   |   2018  |Cid. Mex | Orlando |  Juan   |  004    ||     2,71   | *4
|    MX   |   2018  |Cid. Mex | Orlando |  Juan   |  004    ||    14,25   | *5
|    MX   |   2018  |Cid. Mex | Javier  |Hernandez|  004    ||     16,1   | *6
...

Explain the result I'm looking for:

  1. As the "key" country+year+store+manager+vendor+customer matches in both tables I brought the 'target' from table 'Quota'.
  2. In this example was defined a sales quota for the Vendor Fabio, regardless the customers, so in every row that Fabio appears (in the same country, same year, same store and same manager, as a "key"), must appear his quota.
  3. In this example, we see that is defined a quota for Luiz (as was defined a quota for Fabio), but this time Luiz has a quota for customer 001, as we saw in the example (1), but client 001 is a big client, and have a particular quota, for all other customers Luiz have a more opened quota.
  4. The same as example (1)
  5. In this example we see that a "default" value for quota to all Vendors. As Luiz, Fabio, Juan e Hernandez have their own quota, these values are not changed.
  6. Here we have the same concept of "default" value for quota, but in this time in a level above the previous example.

I think it is now clearer, but if something is still hindering understanding, please let me know.

Please help me people.

Thanks in advance for this.

Upvotes: 1

Views: 707

Answers (2)

Thorsten Kettner
Thorsten Kettner

Reputation: 94859

Your wrong examples made the task look more complicated than it actually is :-) What you want boils down to: from the quota matches we find for a sales record take the most precise match.

In PostgreSQL you can do this with DISTINCT ON. In other DBMS you'd use ROW_NUMBER with FETCH FIRST ROW WITH TIES or the like.

select distinct on (s.customer, s.country, s.vendor, s.manager, s.store, s.year)
  s.*, q.target
from sales s
join quota q on  (q.country = s.country or q.country is null)
             and (q.year = s.year or q.year is null)
             and (q.store = s.store or q.store is null)
             and (q.manager = s.manager or q.manager is null)
             and (q.vendor = s.vendor or q.vendor is null)
             and (q.customer = s.customer or q.customer is null)
order by
  s.customer, s.country, s.vendor, s.manager, s.store, s.year,
  num_nonnulls(q.country, q.year, q.store, q.manager, q.vendor, q.customer) desc;

Upvotes: 0

Hambone
Hambone

Reputation: 16377

This example may not work perfectly, but I think you can accomplish what you want by using multiple left joins and a coalesce. Try this and let me know how close it is:

select
  s.*,
  coalesce (q1.target, q2.target, q3.target, q4.target) as target
from
  sales s
  left join quota q1 on
    s.country = q1.country and
    s.year = q1.year and
    s.manager = q1.manager and
    s.vendor = q1.vendor and
    s.customer = q1.customer
  left join quota q2 on
    s.country = q2.country and
    s.year = q2.year and
    s.manager = q2.manager and
    s.vendor = q2.vendor and
    q2.customer is null
  left join quota q3 on
    s.country = q3.country and
    s.year = q3.year and
    s.manager = q3.manager and
    q3.vendor is null and
    q3.customer is null
  left join quota q4 on
    s.country = q4.country and
    s.year = q4.year and
    q4.manager is null and
    q4.vendor is null and
    q4.customer is null

Essentially you are doing multiple joins to the same table and attempting to pick the join that has the most matches first, then cascading down until you find a match.

Upvotes: 2

Related Questions