Reputation: 553
I have ~8000 entries in table "adr" and want to join it with some other tables. However, this leads to 200k+ results despite using DISTINCT. The culprit here is "stort", as the second query works as expected. However, I can't figure out what I'm doing wrong with the first one. Is it the double gemeinde_id?
200k+ results:
SELECT DISTINCT str.name, adr.hausnummer, stort.name, plz.postleitzahl, adr.PKUID
FROM adresse adr, strasse str, gemeinde gem, stadt_ortsteil stort, postleitzahl plz
WHERE adr.strasse_id = str.strasse_id
AND adr.postleitzahl_id = plz.postleitzahl_id
AND str.gemeinde_id = gem.gemeinde_id
AND gem.gemeinde_id = stort.gemeinde_id
8000 results:
SELECT DISTINCT str.name, adr.hausnummer, gem.name, plz.postleitzahl, adr.PKUID
FROM adresse adr, strasse str, gemeinde gem, postleitzahl plz
WHERE adr.strasse_id = str.strasse_id
AND adr.postleitzahl_id = plz.postleitzahl_id
AND str.gemeinde_id = gem.gemeinde_id
Upvotes: 1
Views: 42
Reputation: 3523
I find a good approach to troubleshooting DISTINCT or GROUP BY options is to set up your query using GROUP BY and then put a HAVING clause to look for duplicates. That can shed some light on large datasets. In this example, we are going to check for Stort.name duplicates, but you could do the same for other fields:
SELECT str.NAME, adr.hausnummer, count(stort.NAME) StortDuplicates, plz.postleitzahl, adr.PKUID
FROM adresse adr, strasse str, gemeinde gem, stadt_ortsteil stort, postleitzahl plz
WHERE adr.strasse_id = str.strasse_id
AND adr.postleitzahl_id = plz.postleitzahl_id
AND str.gemeinde_id = gem.gemeinde_id
AND gem.gemeinde_id = stort.gemeinde_id
GROUP BY str.NAME, adr.hausnummer, stort.NAME, plz.postleitzahl, adr.PKUID
HAVING (COUNT(stort.NAME) > 1)
Upvotes: 1