MapEngine
MapEngine

Reputation: 553

SQL/SQLite: unexpected amount of results despite using DISTINCT

I have ~8000 entries in table "adr" and want to join it with some other tables. However, this leads to 200k+ results despite using DISTINCT. The culprit here is "stort", as the second query works as expected. However, I can't figure out what I'm doing wrong with the first one. Is it the double gemeinde_id?

200k+ results:

SELECT DISTINCT str.name, adr.hausnummer, stort.name, plz.postleitzahl, adr.PKUID
FROM adresse adr, strasse str, gemeinde gem, stadt_ortsteil stort, postleitzahl plz
WHERE adr.strasse_id = str.strasse_id 
AND adr.postleitzahl_id = plz.postleitzahl_id 
AND str.gemeinde_id = gem.gemeinde_id 
AND gem.gemeinde_id = stort.gemeinde_id

8000 results:

SELECT DISTINCT str.name, adr.hausnummer, gem.name, plz.postleitzahl, adr.PKUID
FROM adresse adr, strasse str, gemeinde gem, postleitzahl plz
WHERE adr.strasse_id = str.strasse_id 
AND adr.postleitzahl_id = plz.postleitzahl_id 
AND str.gemeinde_id = gem.gemeinde_id

Upvotes: 1

Views: 42

Answers (1)

Greg Viers
Greg Viers

Reputation: 3523

I find a good approach to troubleshooting DISTINCT or GROUP BY options is to set up your query using GROUP BY and then put a HAVING clause to look for duplicates. That can shed some light on large datasets. In this example, we are going to check for Stort.name duplicates, but you could do the same for other fields:

SELECT str.NAME, adr.hausnummer, count(stort.NAME) StortDuplicates, plz.postleitzahl, adr.PKUID
FROM adresse adr, strasse str, gemeinde gem, stadt_ortsteil stort, postleitzahl plz
WHERE adr.strasse_id = str.strasse_id
    AND adr.postleitzahl_id = plz.postleitzahl_id
    AND str.gemeinde_id = gem.gemeinde_id
    AND gem.gemeinde_id = stort.gemeinde_id
GROUP BY str.NAME, adr.hausnummer, stort.NAME, plz.postleitzahl, adr.PKUID
HAVING (COUNT(stort.NAME) > 1)

Upvotes: 1

Related Questions