Reputation: 1345
I have two tables that join easily and a third that is giving me trouble. First, the tables:
tbl_customer
id dt value group
a 2013-01-01 10 cavalry
a 2012-06-01 20 lance
a 2011-03-01 10 infantry
b 2013-01-01 20 court
b 2012-07-01 5 maiden
b 2005-06-01 10 chivalry
tbl_title
id dt_active dt_inactive title
a 2001-01-01 2012-01-01 mister
a 2012-01-02 3001-01-01 sir
a 2012-01-02 3001-01-01 king
b 2001-01-01 2012-01-01 miss
b 2012-01-02 3001-01-01 lady
b 2012-01-02 3001-01-01 queen
This is easy to join to return a title for a given ID using:
SELECT
id, dt, value, title
FROM
tbl_customer AS cust
INNER JOIN tbl_title AS titles
ON titles.id = cust.id
AND dt >= titles.dt_active
AND dt <= titles.dt_inactive
This returns all of the rows in tbl_customer
with some duplicates where there are multiple "active" titles. For example, the first row in tbl_customer
returns twice, once with "king" and once with "sir" as a title
.
I have a third table that links groups to titles. It can help resolve these relationships:
tbl_group
group title rank
cavalry sir 10
lance king 20
infantry mister 30
court lady 10
court queen 20
maiden lady 10
chivalry miss 5
I can then join the tables to include the title from tbl_group
and limit the results to those that match:
SELECT
id, dt, value, titles.title
FROM
tbl_customer AS cust
INNER JOIN tbl_title AS titles
ON titles.id = cust.id
AND dt >= titles.dt_active
AND dt <= titles.dt_inactive
INNER JOIN tbl_group AS group
ON group.group = cust.group
WHERE
titles.title = group.title
This largely solves my problems. The third table helps to remove the ambiguity when there are multiple "active" records in tbl_title
.
However, there is a problem. The third table can also have multiple titles associated with a group. Therefore, there can still be duplication in the final output. In the example above, "cavalry" is unambiguously associated with the title "sir". However, the group "court" can be either "lady" or "queen", and tbl_title
isn't any help either, since both "lady" and "queen" are active for the date in tbl_customer
.
At this point, I have no other way to choose other than "rank" in tbl_group
. "Queen" outranks "lady" 20 to 10, so I want to use that title for the entry in tbl_customer
. If the ranks are equal, I want to just use the first record in tbl_group
. (Could this be resolved with some kind of internal ordering?)
The tricky part is that the active/inactive dynamic needs to be taken into account before the rank. For example, I don't want to first filter tbl_group
for one title per group, since that would prevent possible matches based on date.
Ideally, I need to return one row per entry from tbl_customer
that shows title, first based on active/inactive date. Next, I want to eliminate duplicates using tbl_group
. Finally, I want to limit the results to just one row per entry in tbl_customer
using matching and rank from tbl_group
. Is this even possible?
Upvotes: 3
Views: 125
Reputation: 64635
First, some actual table schema would be helpful as the table data you have provided is missing some key elements. What are the keys in each of the tables? I.e., what can be used to uniquely identify a row in each of the tables? What does the id
column in the customer and title table represent? Surely not a customer identifier since there are duplicates.
Second, to answer your question, you can use a ranking function such as Row_Number
to rank the items in your list and only return the first item for a given set:
With RnkItems As
(
Select C.id, C.dt, C.value, T.Title
, Row_Number() Over ( Partition By C.id, C.dt, C.value, C.[group]
Order By G.rank Desc ) As Rnk
From tbl_customer As C
Join tbl_title As T
On T.id = C.id
And C.dt Between T.dt_active And T.dt_inactive
Join tbl_group As G
On G.[group] = C.[group]
And G.title = T.title
)
Select id, dt, value, title
From RnkItems
Where Rnk = 1
The key is the Partition By C.id, C.dt, C.value, C.[group]
statement. Why all four columns? The reason is that I have no idea what uniquely identifies a customer row. If there were a primary key on that table, we could simply partition by that.
Upvotes: 2
Reputation: 3456
Try this:
SELECT *
FROM
(
SELECT
id,
dt,
value,
titles.title,
group.rank MAX(rank) OVER(PARTITION BY cust.id) AS MAX_RANK
FROM
tbl_customer AS cust
INNER JOIN tbl_title AS titles
ON titles.id = cust.id
AND dt BETWEEN titles.dt_active AND titles.dt_inactive
INNER JOIN tbl_group AS group
ON group.group = cust.group
WHERE
titles.title = group.title
) AS RESULTS
WHERE rank = MAX_RANK
The windowing function MAX(rank) OVER(PARTITION BY cust.id)
will get the max rank per id. Then we filter the rows to only return the row whose rank is that customer's maximum rank.
Upvotes: 1
Reputation: 425331
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY g.group, g.title ORDER BY g.rank DESC) rn
FROM tbl_customer с
JOIN tbl_title t
ON t.id = c.id
AND c.dt BETWEEN t.dt_active AND t.dt_inactive
JOIN tbl_group g
ON g.[group] = c.[group]
AND g.title = c.title
) q
WHERE rn = 1
Upvotes: 0