kspearrin
kspearrin

Reputation: 10738

LEFT JOIN Causes Duplicate Result

I've put together a simplified version of the problem here.

Scenario

Schema

-- -------------------------------------
-- User
-- -------------------------------------

CREATE TABLE [User] (
  [Id] VARCHAR(50) NOT NULL
);

INSERT INTO [User]
  VALUES ('user_1');
INSERT INTO [User]
  VALUES ('user_2');

-- -------------------------------------
-- Folder
-- -------------------------------------

CREATE TABLE [Folder] (
    [Id] VARCHAR(50) NOT NULL,
    [UserId] VARCHAR(50) NOT NULL
);

-- Each user has a private folder
INSERT INTO [Folder]
  VALUES ('user1_folder', 'user_1');
INSERT INTO [Folder]
  VALUES ('user2_folder', 'user_2');

-- -------------------------------------
-- File
-- -------------------------------------

CREATE TABLE [File] (
  [Id] VARCHAR(50) NOT NULL,
  [UserId] VARCHAR(50) NULL
);

-- Private files
INSERT INTO [File]
  VALUES ('user1_file1', 'user_1');
INSERT INTO [File]
  VALUES ('user1_file2', 'user_1');

INSERT INTO [File]
  VALUES ('user2_file1', 'user_2');
INSERT INTO [File]
  VALUES ('user2_file2', 'user_2');

-- Shared files
INSERT INTO [File]
  VALUES ('shared_file1', NULL);
INSERT INTO [File]
  VALUES ('shared_file2', NULL);
INSERT INTO [File]
  VALUES ('shared_file3', NULL);
-- UPDATE: new case
INSERT INTO [File]
  VALUES ('shared_file4', NULL);

-- -------------------------------------
-- FolderFile Association
-- -------------------------------------

CREATE TABLE [FolderFile] (
    [FolderId] VARCHAR(50) NOT NULL,
    [FileId] VARCHAR(50) NOT NULL
);

-- User 1 puts some files in his private folders
INSERT INTO [FolderFile]
  VALUES ('user1_folder', 'user1_file');
INSERT INTO [FolderFile]
  VALUES ('user1_folder', 'shared_file1');
INSERT INTO [FolderFile]
  VALUES ('user1_folder', 'shared_file2');

-- User 2 puts some files in his private folders
INSERT INTO [FolderFile]
  VALUES ('user2_folder', 'user2_file');
INSERT INTO [FolderFile]
  VALUES ('user2_folder', 'shared_file1');
-- UPDATE: new case
INSERT INTO [FolderFile]
  VALUES ('user2_folder', 'shared_file4');

Desired Result

I want to see all private and shared files for the given @UserId (user_1 in this case), along with the associated private folder for that user (if it has one). Note that a folder is optional for a user's file.

Attempted Query #1

DECLARE @UserId VARCHAR(50) = 'user_1'

SELECT
  F.[Id] AS [FileId],
  F.[UserId] AS [FileUserId],
  FO.[Id] AS [FolderId]
FROM
  [File] AS F
LEFT JOIN
  [FolderFile] FOF ON FOF.[FileId] = F.[Id]
LEFT JOIN
  [Folder] FO ON FO.[Id] = FOF.[FolderId]
WHERE
  F.[UserId] IS NULL
  OR F.[UserId] = @UserId

Result #1

FileId          FileUserId    FolderId
=========================================
user1_file1     user_1        NULL
user1_file2     user_1        NULL
shared_file1    NULL          user1_folder
shared_file1    NULL          user2_folder   <== bad result
shared_file2    NULL          user1_folder
shared_file3    NULL          NULL
shared_file4    NULL          user2_folder   <== bad result

Attempted Query #2

Adding another condition to the Folder JOIN ON.

DECLARE @UserId VARCHAR(50) = 'user_1'

SELECT
  F.[Id] AS [FileId],
  F.[UserId] AS [FileUserId],
  FO.[Id] AS [FolderId]
FROM
  [File] AS F
LEFT JOIN
  [FolderFile] FOF ON FOF.[FileId] = F.[Id]
LEFT JOIN
  [Folder] FO ON FO.[Id] = FOF.[FolderId] AND FO.[UserId] = @UserId -- Add another condition here on UserId
WHERE
  F.[UserId] IS NULL
  OR F.[UserId] = @UserId

Result #2

FileId          FileUserId    FolderId
=========================================
user1_file1     user_1        NULL
user1_file2     user_1        NULL
shared_file1    NULL          user1_folder
shared_file1    NULL          NULL           <== bad result
shared_file2    NULL          user1_folder
shared_file3    NULL          NULL
shared_file4    NULL          NULL

Analysis

As you can see above, the association for user_2's folder is causing an extra row to be returned for user_1. I do not want this row to be included.

If the FolderFile table had a UserId on it I think I could limit it there with a conditional, but it doesn't. The UserId is implied through the associated Folder. The LEFT JOINs on the association are causing it to propagate null and pass the conditions below it.

I'm running out of ideas, though it's probably something obvious :)

UPDATE #1

I added a new case with shared_file4, which is in a folder for user_2, but not user_1. It should be included in the results for both users.

INSERT INTO [File]
  VALUES ('shared_file4', NULL);

INSERT INTO [FolderFile]
  VALUES ('user2_folder', 'shared_file4');

Upvotes: 2

Views: 157

Answers (9)

kspearrin
kspearrin

Reputation: 10738

Though many of the answers given were able to return the result set that I needed, they didn't offer very good query plans. I ultimately decided the best way to achieve my desired result-set with optimal performance was to denormalize the FolderFile table and add a UserId column. With this column now available, I can stick with standard joins similar to my original query attempts, filtering the user at the FolderFile LEFT JOIN.

CREATE TABLE [FolderFile] (
    [FolderId] VARCHAR(50) NOT NULL,
    [FileId] VARCHAR(50) NOT NULL,
    [UserId] VARCHAR(50) NOT NULL
);

SELECT
  F.[Id] AS [FileId],
  F.[UserId] AS [FileUserId],
  FO.[Id] AS [FolderId]
FROM
  [File] AS F
LEFT JOIN
  [FolderFile] FOF ON FOF.[FileId] = F.[Id] AND FOF.[UserId] = @UserId
LEFT JOIN
  [Folder] FO ON FO.[Id] = FOF.[FolderId]
WHERE
  F.[UserId] IS NULL
  OR F.[UserId] = @UserId

Upvotes: 0

SqlZim
SqlZim

Reputation: 38023

Using outer apply():

declare @UserId varchar(50) = 'user_1';
select 
    FileId = F.Id
  , FileUserId = F.UserId
  , FolderId = x.Id
from [File] as F 
  outer apply (
    select top 1 
      Id = case when fo.UserId = @UserId then fo.Id else null end
    from [FolderFile] fof 
      left join [Folder] fo
        on fo.Id = fof.FolderId
    where fof.FileId = f.id
    order by case when fo.UserId = @UserId then 0 else 1 end
    ) as x
where (f.UserId = @UserId or f.UserId is null);

rextester demo: http://rextester.com/YEAMZ12650

returns:

+--------------+------------+--------------+
|    FileId    | FileUserId |   FolderId   |
+--------------+------------+--------------+
| user1_file1  | user_1     | NULL         |
| user1_file2  | user_1     | NULL         |
| shared_file1 | NULL       | user1_folder |
| shared_file2 | NULL       | user1_folder |
| shared_file3 | NULL       | NULL         |
| shared_file4 | NULL       | NULL         |
+--------------+------------+--------------+

Upvotes: 1

TriV
TriV

Reputation: 5148

I change your #query2 a bit with Row_number

;WITH temp AS 
 (
    SELECT
    F.[Id] AS [FileId],
    F.[UserId] AS [FileUserId],
    FO.[Id] AS [FolderId],
    row_number() OVER(PARTITION BY F.Id ORDER BY FO.Id DESC) AS Rn 
   -- if folder id not null (it means that folder belongs to @UserId) 
   --> it will be the first priority -- Rownumber = 1
   FROM
    [File] AS F
   LEFT JOIN
    [FolderFile] FOF ON FOF.[FileId] = F.[Id]
   LEFT JOIN
    [Folder] FO ON FO.[Id] = FOF.[FolderId] AND FO.[UserId] = @UserId 
   WHERE
    F.[UserId] IS NULL
    OR F.[UserId] = @UserId
 )
 SELECT t.FileId, t.FileUserId, t.FolderId FROM temp t
 WHERE rn = 1

Upvotes: 1

HABO
HABO

Reputation: 15816

This provides the desired answers for the given data.

-- Sample data.
declare @Users as Table ( UserId VarChar(50) not NULL );
insert into @Users ( UserId ) values
  ( 'user_1' ), ( 'user_2' );

declare @Folders as Table ( FolderId VarChar(50) not NULL, UserId VarChar(50) not NULL );
insert into @Folders ( FolderId, UserId ) values
  ( 'user1_folder', 'user_1' ), ( 'user2_folder', 'user_2' );

declare @Files as Table ( FileId VarChar(50) not NULL, UserId VarChar(50) NULL );
insert into @Files ( FileId, UserId ) values
  -- Private files.
  ( 'user1_file1', 'user_1' ), ( 'user1_file2', 'user_1' ),
  ( 'user2_file1', 'user_2' ), ( 'user2_file2', 'user_2' ),
  -- Shared files.
  ( 'shared_file1', NULL ), ( 'shared_file2', NULL ), ( 'shared_file3', NULL ), ( 'shared_file4', NULL );

declare @FileFolders as Table ( FolderId VarChar(50) not NULL, FileId VarChar(50) not NULL );
insert into @FileFolders ( FolderId, FileId ) values
  -- User 1 puts some files in his private folders.
  ( 'user1_folder', 'user1_file' ), ( 'user1_folder', 'shared_file1' ), ( 'user1_folder', 'shared_file2' ),
  -- User 2 puts some files in his private folders.
  ( 'user2_folder', 'user2_file' ), ( 'user2_folder', 'shared_file1' ), ( 'user2_folder', 'shared_file4' );

select * from @Users;
select * from @Files;
select * from @Folders;
select * from @FileFolders;

-- Query the data.
declare @UserId as VarChar(50) = 'user_1';

with
  -- Any file with a   UserId   of   NULL is shared.
  --   If it is in the given user's folders then pick up the folder.
  SharedFiles as (
    select Fi.FileId, Max( Fi.UserId ) as UserId, Max( Fo.FolderId ) as FolderId
      from @Files as Fi left outer join
        @FileFolders as FF on FF.FileId = Fi.FileId left outer join
        @Folders as Fo on Fo.FolderId = FF.FolderId and ( Fo.UserId = @UserId or FF.FileId is NULL )
      where Fi.UserId is NULL
      group by Fi.FileId ),
  -- Any file with a non-NULL   UserId   is private.
  --   Find all of the given user's files.
  PrivateFiles as (
    select Fi.FileId, Fi.UserId, Fo.FolderId
      from @Files as Fi left outer join
        @FileFolders as FF on FF.FileId = Fi.FileId left outer join
        @Folders as Fo on Fo.FolderId = FF.FolderId and Fo.UserId = @UserId
      where Fi.UserId = @UserId )
   select FileId, UserId, FolderId
     from PrivateFiles
   union all
   select FileId, UserId, FolderId
     from SharedFiles;

Upvotes: 2

tep
tep

Reputation: 833

The extra row is being introduced by the first LEFT JOIN to FolderFile, not by the LEFT JOIN to Folder, so adding an extra join condition on the Folder table won't eliminate that row.

You can, however, filter the row out in the WHERE clause. Since you want rows for either shared files with no folders e.g. shared_file3, or rows for shared files linked to folders belonging to @UserId, just add the following filter to query 1.

DECLARE @UserId VARCHAR(50) = 'user_1'

SELECT
  F.[Id] AS [FileId],
  F.[UserId] AS [FileUserId],
  FO.[Id] AS [FolderId]
FROM
  [File] AS F
LEFT JOIN
  [FolderFile] FOF ON FOF.[FileId] = F.[Id]
LEFT JOIN
  [Folder] FO ON FO.[Id] = FOF.[FolderId]
WHERE (F.[UserId] IS NULL OR F.[UserId] = @UserId)
  AND (FO.UserId IS NULL OR FO.UserId = @UserId)

Update

If you only want to include private folders which belong to the user, but still include all shared files, then the following should do the trick.

DECLARE @UserId VARCHAR(50) = 'user_1'

SELECT
  F.[Id] AS [FileId],
  F.[UserId] AS [FileUserId],
  FO.[Id] AS [FolderId]
FROM
  [File] AS F
LEFT JOIN
  [FolderFile] FOF ON FOF.[FileId] = F.[Id]
LEFT JOIN
  [Folder] FO ON FO.[Id] = FOF.[FolderId] AND FO.UserId = @UserId
WHERE (F.[UserId] IS NULL OR F.[UserId] = @UserId)
  AND (FO.UserId IS NULL OR FO.UserId = @UserId)

Upvotes: 0

Amy B
Amy B

Reputation: 110071

There's already some good answers about LEFT JOIN. I decided to play around with CTE's and see if I could make a very expressive answer instead. Enjoy:

DECLARE @UserId VARCHAR(50) = 'user_1'

;WITH
PrivateFile (FileId) AS
(
  SELECT Id FROM [File]
  WHERE UserId = @UserId
),
SharedFile (FileId) AS
(
  SELECT Id FROM [File]
  WHERE UserId is null
),
AnyFile ([FileId]) AS
(
  SELECT FileId FROM PrivateFile
  UNION
  SELECT FileId FROM SharedFile
),
PrivateFolder (FolderId) AS
(
  SELECT Id FROM [Folder]
  WHERE UserId = @UserId
),
AssociatedFolder ([FileId], [FolderId]) AS
(
  SELECT ff.FileId, ff.FolderId
  FROM [FolderFile] ff
    JOIN PrivateFolder pf ON ff.FolderId = pf.FolderId
)
SELECT f.[FileId], @UserId as UserId, fo.[FolderId]
FROM AnyFile as f
  LEFT JOIN AssociatedFolder as fo ON f.[FileId] = fo.[FileId]

Upvotes: 2

Pham X. Bach
Pham X. Bach

Reputation: 5432

You could try this:

DECLARE @UserId VARCHAR(50) = 'user_1'

SELECT
    F.[Id] AS [FileId],
    F.[UserId] AS [FileUserId],
    FO.[Id] AS [FolderId]
FROM
    [File] AS F
LEFT JOIN
    [FolderFile] FOF ON FOF.[FileId] = F.[Id]
LEFT JOIN
    [Folder] FO ON FO.[Id] = FOF.[FolderId]
WHERE
    FO.[UserId] = @UserId
    OR F.[UserId] = @UserId;

Edited: For shared_file_3, which has NULL user_id, and not in any folder, If for your design it will appear in all user shared files then you should use:

DECLARE @UserId VARCHAR(50) = 'user_1'

SELECT
    F.[Id] AS [FileId],
    F.[UserId] AS [FileUserId],
    FO.[Id] AS [FolderId]
FROM
    [File] AS F
LEFT JOIN
    [FolderFile] FOF ON FOF.[FileId] = F.[Id]
LEFT JOIN
    [Folder] FO ON FO.[Id] = FOF.[FolderId]
WHERE
    FO.[UserId] = @UserId
    OR F.[UserId] = @UserId
    OR (FO.[UserId] IS NULL AND F.[UserId] IS NULL);

Upvotes: 0

toonice
toonice

Reputation: 2236

Please try the following...

DECLARE @UserId VARCHAR( 50 ) = 'user_1'

SELECT File.Id AS FileId,
       File.UserId AS FileUserId,
       Folder.Id AS FolderId
FROM File
LEFT JOIN FolderFile ON FolderFile.FileId = File.Id
LEFT JOIN Folder ON Folder.Id = FolderFile.FolderId
                AND Folder.UserId = @UserId
WHERE ( File.UserId IS NULL OR
        File.UserId = @UserId )
  AND ( FileUserId IS NOT NULL OR
        FolderId IS NOT NULL )

Your second attempted query was close, you just needed to add an exclusionary clause for where both fields are NULL / inclusionary clause where at least one field is NULL.

If you have any questions or comments, then please feel free to post a Comment accordingly.

Further Reading

https://www.w3schools.com/sql/sql_null_values.asp

Upvotes: 0

yobioo
yobioo

Reputation: 89

[FolderFile] Table shows that "shared_file1" exists in both user1_folder and user2_folder. Is this correct?

(Sorry, I don't have sufficient point to add a comment.)

Upvotes: 0

Related Questions