Reputation: 25
I have a set of tables aimed at representing a profile storage system. I have created a view that should display all the important parts of this profile, that being things like the name, email, number of followers and the list of activities they are a part of.
However, I've noticed that the values in the activities cell are duplicating. The same thing appears multiple times when each activity name should only appear once. In the below table the Nulls at the bottom are intentional.
Weight | Height | Activities | TotalFollowing | TotalFollowers |
---|---|---|---|---|
68 | 170 | Horse Riding, Horse Riding, Horse Riding, Horse Riding, Horse Riding, Horse Riding, Horse Riding, Horse Riding, Bird Watching, Bird Watching, Bird Watching, Bird Watching, Bird Watching, Bird Watching, Bird Watching, Bird Watching, Hunting, Hunting, Hunting, Hunting, Hunting, Hunting, Hunting, Hunting | 4 | 2 |
63 | 179 | Horse Riding, Horse Riding, Horse Riding, Hiking, Hiking, Hiking, Swimming, Swimming, Swimming, Fishing, Fishing, Fishing | 1 | 3 |
72 | 130 | NULL | 3 | 1 |
NULL | NULL | NULL | 1 | 2 |
There is a user that is identified by it's MemberID, a link table called User_Activity which only contains MemberID and ActivityID, the latter of which points to the activity table that holds the name for an activity under a certain ID.
ActivityID | ActivityName |
---|---|
1 | Horse Riding |
2 | Hiking |
3 | Bird Watching |
4 | Backpacking |
5 | Swimming |
6 | Fishing |
7 | Hunting |
And the user activity tables holds these.
MemberID | ActivityID |
---|---|
1 | 1 |
1 | 3 |
1 | 7 |
2 | 1 |
2 | 2 |
2 | 5 |
2 | 6 |
The SQL for this view is as follows.
I assume the issue has come from the fact that the user follows table and the references I use there and up pointing back to the user, so it counts their activities again and adds them to the column depending on how many followers they have, since these two example records repeat a different number of times.
CREATE VIEW [View_Profile_Information] AS
SELECT U.MemberID,
U.FirstName,
U.LastName,
L.Town,
L.County,
L.Country,
U.Email,
U.AboutMe,
U.DistanceUnit,
U.ActivityTimePreference,
U.Weight,
U.Height,
STRING_AGG(A.ActivityName, ', ') AS Activities,
COUNT(DISTINCT UF.Follows_MemberID) AS TotalFollowing,
COUNT(DISTINCT UF2.This_MemberID) AS TotalFollowers
FROM CW1.[User] U
LEFT JOIN CW1.[Location] L
ON U.LocationID = L.LocationID
LEFT JOIN CW1.[User_Activity] UA
ON U.MemberID = UA.MemberID
LEFT JOIN CW1.[Activity] A
ON UA.ActivityID = A.ActivityID
LEFT JOIN CW1.[User_Follows] UF
ON U.MemberID = UF.This_MemberID
LEFT JOIN CW1.[User_Follows] UF2
ON U.MemberID = UF2.Follows_MemberID
GROUP BY U.MemberID, U.FirstName, LastName, Town, County, Country, Email, AboutMe, DistanceUnit, ActivityTimePreference, Weight,
Height
Upvotes: 0
Views: 96
Reputation: 9907
What you have is a query that contains multiple independent one-to-many joins. This is near guaranteed to cause data duplication in the results. Using DISTINCT
or GROUP BY
to try to resolve the duplicates is rarely the correct solution for unexpected duplicate data.
What you need to do is to moved each relationship branch into separate subqueries. In this case those subqueries can either be included directly in the select list or wrapped up in a CROSS APPLY
.
Something like:
CREATE VIEW [View_Profile_Information] AS
SELECT
U.MemberID, U.FirstName, U.LastName,
L.Town, L.County, L.Country,
U.Email, U.AboutMe, U.DistanceUnit,
U.ActivityTimePreference, U.Weight, U.Height,
(
SELECT STRING_AGG(A.ActivityName, ', ')
FROM CW1.[User_Activity] UA
JOIN CW1.[Activity] A
ON UA.ActivityID = A.ActivityID
WHERE UA.MemberID = U.MemberID
) AS Activities,
(
SELECT COUNT(*)
FROM CW1.[User_Follows] UF
WHERE UF.This_MemberID = U.MemberID
) AS TotalFollowing,
(
SELECT COUNT(*)
FROM CW1.[User_Follows] UF2
WHERE UF2.Follows_MemberID = U.MemberID
) AS TotalFollowers
FROM CW1.[User] U
LEFT JOIN CW1.[Location] L
ON U.LocationID = L.LocationID
You can also use CROSS APPLY
to calculate the aggregated values separately down in the FROM
section of the query, so that the final select list is cleaner and more readable.
CREATE VIEW View_Profile_Information AS
SELECT
U.MemberID, U.FirstName, U.LastName,
L.Town, L.County, L.Country,
U.Email, U.AboutMe, U.DistanceUnit,
U.ActivityTimePreference, U.Weight, U.Height,
A.Activities, TFG.TotalFollowing, TFW.TotalFollowers
FROM CW1.User U
LEFT JOIN CW1.Location L
ON U.LocationID = L.LocationID
CROSS APPLY (
SELECT STRING_AGG(A.ActivityName, ', ') AS Activities
FROM CW1.User_Activity UA
JOIN CW1.Activity A
ON UA.ActivityID = A.ActivityID
WHERE UA.MemberID = U.MemberID
) AS A
CROSS APPLY (
SELECT COUNT(*) AS TotalFollowing
FROM CW1.User_Follows UF
WHERE UF.This_MemberID = U.MemberID
) AS TFG
CROSS APPLY (
SELECT COUNT(*) AS TotalFollowers
FROM CW1.User_Follows UF2
WHERE UF2.Follows_MemberID = U.MemberID
) AS TFW
Note that I have removed the DISTINCT
s and GROUP BY
, and COUNT(*)
is now just counting rows.
Upvotes: 1