Jeffrey Kramer
Jeffrey Kramer

Reputation: 1345

Difficult SQL join across tables

I have two tables that join easily and a third that is giving me trouble. First, the tables:

tbl_customer
id  dt           value  group
a   2013-01-01   10     cavalry
a   2012-06-01   20     lance
a   2011-03-01   10     infantry
b   2013-01-01   20     court
b   2012-07-01   5      maiden
b   2005-06-01   10     chivalry

tbl_title
id  dt_active    dt_inactive   title
a   2001-01-01   2012-01-01    mister
a   2012-01-02   3001-01-01    sir
a   2012-01-02   3001-01-01    king
b   2001-01-01   2012-01-01    miss
b   2012-01-02   3001-01-01    lady
b   2012-01-02   3001-01-01    queen

This is easy to join to return a title for a given ID using:

SELECT 
  id, dt, value, title
FROM
  tbl_customer AS cust
INNER JOIN tbl_title AS titles
  ON titles.id = cust.id
  AND dt >= titles.dt_active
  AND dt <= titles.dt_inactive

This returns all of the rows in tbl_customer with some duplicates where there are multiple "active" titles. For example, the first row in tbl_customer returns twice, once with "king" and once with "sir" as a title.

I have a third table that links groups to titles. It can help resolve these relationships:

tbl_group
group     title   rank
cavalry   sir     10
lance     king    20
infantry  mister  30
court     lady    10
court     queen   20
maiden    lady    10
chivalry  miss    5

I can then join the tables to include the title from tbl_group and limit the results to those that match:

SELECT 
  id, dt, value, titles.title
FROM
  tbl_customer AS cust
INNER JOIN tbl_title AS titles
  ON titles.id = cust.id
  AND dt >= titles.dt_active
  AND dt <= titles.dt_inactive
INNER JOIN tbl_group AS group
  ON group.group = cust.group
WHERE
  titles.title = group.title

This largely solves my problems. The third table helps to remove the ambiguity when there are multiple "active" records in tbl_title.

However, there is a problem. The third table can also have multiple titles associated with a group. Therefore, there can still be duplication in the final output. In the example above, "cavalry" is unambiguously associated with the title "sir". However, the group "court" can be either "lady" or "queen", and tbl_title isn't any help either, since both "lady" and "queen" are active for the date in tbl_customer.

At this point, I have no other way to choose other than "rank" in tbl_group. "Queen" outranks "lady" 20 to 10, so I want to use that title for the entry in tbl_customer. If the ranks are equal, I want to just use the first record in tbl_group. (Could this be resolved with some kind of internal ordering?)

The tricky part is that the active/inactive dynamic needs to be taken into account before the rank. For example, I don't want to first filter tbl_group for one title per group, since that would prevent possible matches based on date.

Ideally, I need to return one row per entry from tbl_customer that shows title, first based on active/inactive date. Next, I want to eliminate duplicates using tbl_group. Finally, I want to limit the results to just one row per entry in tbl_customer using matching and rank from tbl_group. Is this even possible?

Upvotes: 3

Views: 125

Answers (3)

Thomas
Thomas

Reputation: 64635

First, some actual table schema would be helpful as the table data you have provided is missing some key elements. What are the keys in each of the tables? I.e., what can be used to uniquely identify a row in each of the tables? What does the id column in the customer and title table represent? Surely not a customer identifier since there are duplicates.

Second, to answer your question, you can use a ranking function such as Row_Number to rank the items in your list and only return the first item for a given set:

With RnkItems As
  (
  Select C.id, C.dt, C.value, T.Title
    , Row_Number() Over ( Partition By C.id, C.dt, C.value, C.[group]
                          Order By G.rank Desc ) As Rnk
  From tbl_customer As C
    Join tbl_title As T
      On T.id = C.id
        And C.dt Between T.dt_active And T.dt_inactive
    Join tbl_group As G
      On G.[group] = C.[group]
        And G.title = T.title
  )
Select id, dt, value, title
From RnkItems
Where Rnk = 1

The key is the Partition By C.id, C.dt, C.value, C.[group] statement. Why all four columns? The reason is that I have no idea what uniquely identifies a customer row. If there were a primary key on that table, we could simply partition by that.

SQL Fiddle version

Upvotes: 2

Vulcronos
Vulcronos

Reputation: 3456

Try this:

SELECT *
FROM
(
SELECT 
  id, 
  dt, 
  value, 
  titles.title, 
  group.rank MAX(rank) OVER(PARTITION BY cust.id) AS MAX_RANK
FROM
  tbl_customer AS cust
INNER JOIN tbl_title AS titles
  ON titles.id = cust.id
  AND dt BETWEEN titles.dt_active AND titles.dt_inactive
INNER JOIN tbl_group AS group
  ON group.group = cust.group
WHERE
  titles.title = group.title
) AS RESULTS
WHERE rank = MAX_RANK

The windowing function MAX(rank) OVER(PARTITION BY cust.id) will get the max rank per id. Then we filter the rows to only return the row whose rank is that customer's maximum rank.

Upvotes: 1

Quassnoi
Quassnoi

Reputation: 425331

SELECT  *
FROM    (
        SELECT  *,
                ROW_NUMBER() OVER (PARTITION BY g.group, g.title ORDER BY g.rank DESC) rn
        FROM    tbl_customer с
        JOIN    tbl_title t
        ON      t.id = c.id
                AND c.dt BETWEEN t.dt_active AND t.dt_inactive
        JOIN    tbl_group g
        ON      g.[group] = c.[group]
                AND g.title = c.title
        ) q
WHERE   rn = 1

Upvotes: 0

Related Questions