erasmo carlos
erasmo carlos

Reputation: 682

How to JOIN and GROUP correctly

I am having trouble understanding how to join three tables together without creating duplicate records.

I have my first query which uses table profile, and tabe instruments:

SELECT
  [p].[shopper_id]
, [pi].[instrument_id]
FROM
  [dbo].[profile] [p]
INNER JOIN [dbo].[profile_instruments] [pi]
ON  [pi].[PID] = [p].[PID]
WHERE
  [p].[date_created] > DATEADD(yy, -2, GETDATE())
  AND [p].[shopper_id] = '53D5444535434747A935E207C9EDD96A'
ORDER BY
  [p].[shopper_id];

This query gives me the results:

shopper_id  instrument_id
53D5444535434747A935E207C9EDD96A    35
53D5444535434747A935E207C9EDD96A    17

My second query uses table profile and table styles:

SELECT
  [p].[shopper_id]
, [ps].[style_id]
FROM
  [dbo].[profile] [p]
INNER JOIN [dbo].[profile_styles] [ps]
ON  [ps].[PID] = [p].[PID]
WHERE
  [p].[date_created] > DATEADD(yy, -2, GETDATE())
  AND [p].[shopper_id] = '53D5444535434747A935E207C9EDD96A'
ORDER BY
  [p].[shopper_id];

The results are:

shopper_id  style_id
53D5444535434747A935E207C9EDD96A    845
53D5444535434747A935E207C9EDD96A    291

When I combine the 3 tables profile, instruments and styles:

SELECT
    [p].[shopper_id]
  , [pi].[instrument_id]
  , [ps].[style_id]
FROM
    [dbo].[profile] [p]
INNER JOIN [dbo].[profile_instruments] [pi]
ON  [pi].[PID] = [p].[PID]
INNER JOIN [dbo].[profile_styles] [ps]
ON  [ps].[PID] = [p].[PID]
WHERE
    [p].[date_created] > DATEADD(yy, -2, GETDATE())
    AND [p].[shopper_id] = '53D5444535434747A935E207C9EDD96A'
ORDER BY
    [p].[shopper_id];

I get the results:

shopper_id  instrument_id   style_id
53D5444535434747A935E207C9EDD96A    35  845
53D5444535434747A935E207C9EDD96A    35  291
53D5444535434747A935E207C9EDD96A    17  845
53D5444535434747A935E207C9EDD96A    17  291

I am not using a group by yet because I am unsure how to apply it with the columns I have. I am also unsure if the duplicates are because of the type of join I am using, or because I am not using a group by.

At any rate, I would like to ask for some help to be able to figure out what modifications to my query I need to make in order to have an output that looks like:

shopper_id  instrument_id   style_id
53D5444535434747A935E207C9EDD96A    35  845
53D5444535434747A935E207C9EDD96A    17  291

Thank you in advance for your help.

Upvotes: 1

Views: 35

Answers (2)

xQbert
xQbert

Reputation: 35323

Maybe...

We assign a row number to each style and instrument for each PID. Then not only do we join by PID, we join by Row_number which guarantees when a pID has 2 instruments and two styles, we still only get 2 records; instead of 4.

A full outer join is used as I don't know if you want to see situations where 2 instruments exists but only 1 style or 2 styles exist but only 1 instrument.

SELECT [p].[shopper_id]
     , [pi].[instrument_id]
     , [ps].[style_id]
FROM [dbo].[profile] [p]
INNER JOIN (SELECT A.*, row_number() over (partition by PID order by instrument_ID) RN 
            FROM [dbo].[profile_instruments] A) [pi]
  ON  [pi].[PID] = [p].[PID]
FULL OUTER JOIN (SELECT A.*, Row_number() over (partition by PID order by style_ID) RN 
                 FROM [dbo].[profile_styles] A)  [ps]
  ON  [ps].[PID] = [p].[PID]
 AND [PI].[RN] = [PS].[RN]
WHERE [p].[date_created] > DATEADD(yy, -2, GETDATE())
  AND [p].[shopper_id] = '53D5444535434747A935E207C9EDD96A'
ORDER BY [p].[shopper_id];

We may be able to use a coalesce on the PI.RN or PS.RN or both if you are willing to accept that when styles and instruments have different counts that matching to a single value in either table is acceptable

Example:

PID STYLE_ID      PID Instrument_ID
1   A             1   Z
1   B

The above should return
1 A Z
1 B

But maybe you want
1 A Z
1 B Z

This may work if we change AND [PI].[RN] = [PS].[RN] to AND coalesce([PI].[RN],1) = coalesce([PS].[RN],1) but testing needs to occur. As if one side is devoid of all records for a pid, you'd still get null on one table.

Upvotes: 2

Try making a natural join, would look similar to this:

SELECT
    [p].[shopper_id],
    [pi].[instrument_id],
    [ps].[style_id]
FROM
    [dbo].[profile] [p],
    [dbo].[profile_instruments] [pi],
    [dbo].[profile_styles] [ps]
WHERE
    [p].[date_created] > DATEADD(yy, -2, GETDATE())
    AND [p].[shopper_id] = '53D5444535434747A935E207C9EDD96A'
    AND [pi].[PID] = [p].[PID]
    AND [ps].[PID] = [p].[PID]
ORDER BY
    [p].[shopper_id];

Upvotes: 0

Related Questions