MattR
MattR

Reputation: 5146

Not the correct number of records (joins and cartesian) - SQL

I am running SSMS and trying to write a query to return a big sheet that captures all data from various table for "full market" analysis. However, I either get too little or WAY too much data.

My largest table has 1,025,650 rows. I got this number by doing

SELECT COUNT(*)
FROM Table_name

on all my tables. That was the largest number.

What I want: as mentioned above, my desired result would likely be a result with 1,025,650 rows with multiple columns that come from various tables.

What I have tried: (sorry for big blocks)

SELECT DISTINCT
    dbo.PUBACC_EN.unique_system_identifier, dbo.PUBACC_EN.call_sign, 
    dbo.PUBACC_EN.email, dbo.PUBACC_EN.phone,   
    dbo.PUBACC_EN.first_name,dbo.PUBACC_EN.last_name, 
    dbo.PUBACC_EN.entity_name, dbo.PUBACC_EN.state, 
    dbo.PUBACC_HD.radio_service_code,
    dbo.PUBACC_HD.grant_date, dbo.PUBACC_HD.last_action_date, 
    dbo.PUBACC_HD.expired_date,
    dbo.PUBACC_HD.effective_date, dbo.PUBACC_BF.buildout_date, 
    dbo.PUBACC_FR.transmitter_make, 
    dbo.PUBACC_FR.transmitter_model, dbo.PUBACC_FR.frequency_assigned,
    dbo.PUBACC_LO.lat_degrees, dbo.PUBACC_LO.lat_minutes, 
    dbo.PUBACC_LO.lat_seconds, 
    dbo.PUBACC_LO.lat_direction, dbo.PUBACC_LO.long_degrees, 
    dbo.PUBACC_LO.long_minutes, 
    dbo.PUBACC_LO.long_seconds, dbo.PUBACC_LO.long_direction 
FROM 
    dbo.PUBACC_EN
LEFT OUTER JOIN 
    dbo.PUBACC_HD ON dbo.PUBACC_EN.unique_system_identifier = dbo.PUBACC_HD.unique_system_identifier 
LEFT OUTER JOIN 
    dbo.PUBACC_FR ON dbo.PUBACC_EN.unique_system_identifier = dbo.PUBACC_FR.unique_system_identifier
LEFT OUTER JOIN 
    dbo.PUBACC_LO ON dbo.PUBACC_EN.unique_system_identifier = dbo.PUBACC_LO.unique_system_identifier
LEFT OUTER JOIN 
    dbo.PUBACC_BF ON dbo.PUBACC_EN.unique_system_identifier = dbo.PUBACC_BF.unique_system_identifier
LEFT OUTER JOIN 
    dbo.PUBACC_PA ON dbo.PUBACC_EN.unique_system_identifier = dbo.PUBACC_PA.unique_system_identifier

The above code returns 9,931,904 rows.

So I then tried a smaller query:

SELECT DISTINCT
    dbo.PUBACC_HD.radio_service_code, dbo.PUBACC_FR.transmitter_make, 
    dbo.PUBACC_LO.lat_degrees, dbo.PUBACC_LO.lat_minutes, dbo.PUBACC_LO.lat_seconds, 
    dbo.PUBACC_LO.lat_direction, dbo.PUBACC_LO.long_degrees, dbo.PUBACC_LO.long_minutes, 
    dbo.PUBACC_LO.long_seconds, dbo.PUBACC_LO.long_direction 
FROM 
    dbo.PUBACC_EN
LEFT OUTER JOIN 
    dbo.PUBACC_HD ON dbo.PUBACC_EN.unique_system_identifier = dbo.PUBACC_HD.unique_system_identifier 
LEFT OUTER JOIN 
    dbo.PUBACC_FR ON dbo.PUBACC_EN.unique_system_identifier = dbo.PUBACC_FR.unique_system_identifier
LEFT OUTER JOIN 
    dbo.PUBACC_LO ON dbo.PUBACC_EN.unique_system_identifier = dbo.PUBACC_LO.unique_system_identifier

This returned only 421,432 rows.

My problem and TL;DR :**

I think I am getting a Cartesian Product or my joins are not correct. What would be the solution to my problem?

Upvotes: 0

Views: 66

Answers (3)

Chris Tophski
Chris Tophski

Reputation: 960

I think your query should be done with inner joins, if your maximum row count should be around the row count of the largest involved table. At least your description sounds to me like this. There might also be a need to constrain your result with a distinct clause.

Upvotes: 0

Paurian
Paurian

Reputation: 1402

You have multiple rows in one of your tables (and definitely in one of the PUBACC_BF PUBACC_PA tables) that have the same value in the "unique_system_identifier" field.

You also want to stop using "distinct" as that hides the true Cartesian and thus hides faults in the query.

Upvotes: 0

Hart CO
Hart CO

Reputation: 34784

It's clear that you have multiple values for some unique_system_identifier values in some of your tables. If this is a surprise you'll have to dig into the individual tables to find out where the multiples are coming from. You can run a query like this on each table to view the multiples:

;with cte AS (SELECT * 
                     ,COUNT(*) OVER(PARTITION BY unique_system_identifier) AS CT
              FROM DBO.PUBACC_HD
             )
SELECT *
FROM cte
WHERE CT > 1
ORDER BY CT DESC, unique_system_identifier 

The resolution is to either remove unexpected multiples or to alter your join criteria to ensure a 1:1 join.

Upvotes: 1

Related Questions