Reputation: 5146
I am running SSMS and trying to write a query to return a big sheet that captures all data from various table for "full market" analysis. However, I either get too little or WAY too much data.
My largest table has 1,025,650 rows. I got this number by doing
SELECT COUNT(*)
FROM Table_name
on all my tables. That was the largest number.
What I want: as mentioned above, my desired result would likely be a result with 1,025,650 rows with multiple columns that come from various tables.
What I have tried: (sorry for big blocks)
SELECT DISTINCT
dbo.PUBACC_EN.unique_system_identifier, dbo.PUBACC_EN.call_sign,
dbo.PUBACC_EN.email, dbo.PUBACC_EN.phone,
dbo.PUBACC_EN.first_name,dbo.PUBACC_EN.last_name,
dbo.PUBACC_EN.entity_name, dbo.PUBACC_EN.state,
dbo.PUBACC_HD.radio_service_code,
dbo.PUBACC_HD.grant_date, dbo.PUBACC_HD.last_action_date,
dbo.PUBACC_HD.expired_date,
dbo.PUBACC_HD.effective_date, dbo.PUBACC_BF.buildout_date,
dbo.PUBACC_FR.transmitter_make,
dbo.PUBACC_FR.transmitter_model, dbo.PUBACC_FR.frequency_assigned,
dbo.PUBACC_LO.lat_degrees, dbo.PUBACC_LO.lat_minutes,
dbo.PUBACC_LO.lat_seconds,
dbo.PUBACC_LO.lat_direction, dbo.PUBACC_LO.long_degrees,
dbo.PUBACC_LO.long_minutes,
dbo.PUBACC_LO.long_seconds, dbo.PUBACC_LO.long_direction
FROM
dbo.PUBACC_EN
LEFT OUTER JOIN
dbo.PUBACC_HD ON dbo.PUBACC_EN.unique_system_identifier = dbo.PUBACC_HD.unique_system_identifier
LEFT OUTER JOIN
dbo.PUBACC_FR ON dbo.PUBACC_EN.unique_system_identifier = dbo.PUBACC_FR.unique_system_identifier
LEFT OUTER JOIN
dbo.PUBACC_LO ON dbo.PUBACC_EN.unique_system_identifier = dbo.PUBACC_LO.unique_system_identifier
LEFT OUTER JOIN
dbo.PUBACC_BF ON dbo.PUBACC_EN.unique_system_identifier = dbo.PUBACC_BF.unique_system_identifier
LEFT OUTER JOIN
dbo.PUBACC_PA ON dbo.PUBACC_EN.unique_system_identifier = dbo.PUBACC_PA.unique_system_identifier
The above code returns 9,931,904 rows.
So I then tried a smaller query:
SELECT DISTINCT
dbo.PUBACC_HD.radio_service_code, dbo.PUBACC_FR.transmitter_make,
dbo.PUBACC_LO.lat_degrees, dbo.PUBACC_LO.lat_minutes, dbo.PUBACC_LO.lat_seconds,
dbo.PUBACC_LO.lat_direction, dbo.PUBACC_LO.long_degrees, dbo.PUBACC_LO.long_minutes,
dbo.PUBACC_LO.long_seconds, dbo.PUBACC_LO.long_direction
FROM
dbo.PUBACC_EN
LEFT OUTER JOIN
dbo.PUBACC_HD ON dbo.PUBACC_EN.unique_system_identifier = dbo.PUBACC_HD.unique_system_identifier
LEFT OUTER JOIN
dbo.PUBACC_FR ON dbo.PUBACC_EN.unique_system_identifier = dbo.PUBACC_FR.unique_system_identifier
LEFT OUTER JOIN
dbo.PUBACC_LO ON dbo.PUBACC_EN.unique_system_identifier = dbo.PUBACC_LO.unique_system_identifier
This returned only 421,432 rows.
My problem and TL;DR :**
I think I am getting a Cartesian Product or my joins are not correct. What would be the solution to my problem?
Upvotes: 0
Views: 66
Reputation: 960
I think your query should be done with inner join
s, if your maximum row count should be around the row count of the largest involved table. At least your description sounds to me like this. There might also be a need to constrain your result with a distinct
clause.
Upvotes: 0
Reputation: 1402
You have multiple rows in one of your tables (and definitely in one of the PUBACC_BF PUBACC_PA tables) that have the same value in the "unique_system_identifier" field.
You also want to stop using "distinct" as that hides the true Cartesian and thus hides faults in the query.
Upvotes: 0
Reputation: 34784
It's clear that you have multiple values for some unique_system_identifier
values in some of your tables. If this is a surprise you'll have to dig into the individual tables to find out where the multiples are coming from. You can run a query like this on each table to view the multiples:
;with cte AS (SELECT *
,COUNT(*) OVER(PARTITION BY unique_system_identifier) AS CT
FROM DBO.PUBACC_HD
)
SELECT *
FROM cte
WHERE CT > 1
ORDER BY CT DESC, unique_system_identifier
The resolution is to either remove unexpected multiples or to alter your join criteria to ensure a 1:1 join.
Upvotes: 1