ptownbro
ptownbro

Reputation: 1298

SQL Server Inner Join Creating Duplicate Records

I have two tables that I've create an inner join between and for some strange reason it's returning more records that I expected.

Table 1 has 6 records in it including 2 records of 'Version 1', 2 records of 'Version 2', and 2 records of 'Version 3'. Table 2 has 10 records in it including 10 records of 'Version 1'. If I do a inner join between them on the Version fields I would have thought I'd only get 2 records: Table 1 and Table 2 only match on 'Version 1' of which there are only 2 in Table 1.

However, the following query returns 20 records (if anything I could of understood 60 records if it's converting to cross join, which also would be wrong):

SELECT * FROM TABLE2 T2 INNER JOIN TABLE1 T1 on T2.VersionKey = T1.VersionKey

Can anyone help explain what's happening?

I have another set of tables that are VERY similar which are giving me the expect results and I can't see the difference here.

Here are the basic table structures and data.

CREATE TABLE [dbo].[Table1] (
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [VersionKey] [int] NULL,
    CONSTRAINT [PK_Table1] PRIMARY KEY CLUSTERED ([ID] ASC)
    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[Table2] (
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [VersionKey] [int] NULL,
    CONSTRAINT [PK_Table2] PRIMARY KEY CLUSTERED ([ID] ASC) 
    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

INSERT INTO [Table1] (VersionKey)
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 2 UNION ALL
    SELECT 2 UNION ALL
    SELECT 3 UNION ALL
    SELECT 3
GO

INSERT INTO [Table2] (VersionKey)
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 1 UNION     
    SELECT 1
GO

Upvotes: 1

Views: 913

Answers (2)

ptownbro
ptownbro

Reputation: 1298

Credit goes to Martin above, but to clarify the answer for others following...

As eloquently explained above by others (thanks again for your help) the results are correct and, using my example, it should produce 20 records. However, here's the explanation as to why it's happening which was my original question.

The reason for the duplicate records is because the table you are joining to has duplicate records in the field you are joining on.

enter image description here

That solves the mystery for why I didn't see this in my other tables which contain the 2100 and 3 records respectively. There are no duplicates in the table with 3 records, hence, no duplicates in the final query results.

Why you normally may not see this is because you may typically be joining off the indexed primary key field like an ID field versus what I've done here. In those cases, since those records values will always likely be unique you'll never get duplicates.

Thanks again everyone for your help.

Here's my corrected code from above again with the bug fixed to reproduce the results.

CREATE TABLE [dbo].[Table1] (
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [VersionKey] [int] NULL,
    CONSTRAINT [PK_Table1] PRIMARY KEY CLUSTERED ([ID] ASC)
    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[Table2] (
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [VersionKey] [int] NULL,
    CONSTRAINT [PK_Table2] PRIMARY KEY CLUSTERED ([ID] ASC) 
    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

INSERT INTO [Table1] (VersionKey)
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 2 UNION ALL
    SELECT 2 UNION ALL
    SELECT 3 UNION ALL
    SELECT 3
GO

INSERT INTO [Table2] (VersionKey)
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL   
    SELECT 1
GO

SELECT * FROM TABLE2 T2 INNER JOIN TABLE1 T1 on T2.VersionKey = T1.VersionKey

Upvotes: 0

Martin Smith
Martin Smith

Reputation: 452977

Your setup script has a bug.

INSERT INTO [Table2] (VersionKey)
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 1 UNION ALL
    SELECT 1 UNION      
    SELECT 1

That final UNION needs to be UNION ALL or you will remove duplicates and only insert 1 row not 10.

A fixed script and query results on SEDE is here https://data.stackexchange.com/stackoverflow/query/478676/fixed-script

It returns 20 rows, this is expected and correct.

Logically each row in the left table is compared to every row in the right table. If the join predicate evaluates to true then the match is preserved.

If the inner join predicate is something that is always true, e.g On 1=1 every row in the left ends up successfully matching every row on the right so you just end up with a cross join.

In your case both rows in Table1 with a VersionKey of 1 match 10 rows in Table2 so together they contribute 20 rows. The remaining rows in Table1 don't successfully join to anything and contribute 0 rows.

Upvotes: 1

Related Questions