Addi
Addi

Reputation: 23

NOT IN vs IN Do Not Return Complimentary Results

Hi I am working through example #7 from the sql zoo tutorial: SELECT within SELECT. In the following question

"Find each country that belongs to a continent where all populations are less than 25000000. Show name, continent and population."

I get the right answer by using NOT IN and a sub query like this:

SELECT name, continent, population FROM world 
WHERE continent NOT IN (
    SELECT continent FROM world
    WHERE population > 25000000)

If I on the other hand use "IN" instead of "NOT IN" and "population < 25000000" I do not get the right answer and I can not understand why that is, there is probably simple reason for this I just don't see it, can anyone explain it to me?

Upvotes: 1

Views: 769

Answers (4)

Arioch &#39;The
Arioch &#39;The

Reputation: 16045

Show the table DECLARATION. It seems you use CONTINENT as the continent number. Then you should check it is marked with PRIMARY KEY and NOT NULL options. I realyl suspect you just forgot about very special meaning NULL has in SQL.

I make an example in Firebird 2.5.1 SQL server.

CREATE TABLE WORLD (
    CONTINENT   INTEGER,
    NAME        VARCHAR(20),
    POPULATION  INTEGER
);


INSERT INTO WORLD (CONTINENT, NAME, POPULATION) VALUES (NULL, 'null-id', 100);
INSERT INTO WORLD (CONTINENT, NAME, POPULATION) VALUES (1, 'normal 1', 10);
INSERT INTO WORLD (CONTINENT, NAME, POPULATION) VALUES (2, 'normal 2', 200);
INSERT INTO WORLD (CONTINENT, NAME, POPULATION) VALUES (3, 'null-pop', NULL);
INSERT INTO WORLD (CONTINENT, NAME, POPULATION) VALUES (4, 'normal 4', 110);

COMMIT WORK;

Now let's try your requests and see if the 1st row, having CONTINENT IS NULL would be present anywhere:

SELECT continent, population FROM world
WHERE continent IN (
    SELECT continent FROM world
    WHERE population > 100)

CONTINENT   POPULATION
2           200
4           110

and then

SELECT continent, population FROM world
WHERE continent NOT IN (
    SELECT continent FROM world
    WHERE population > 100)

CONTINENT   POPULATION
1           10
3           <NULL>

By the logic of the request you suppose CONTINENT to be the row ID, then you should make it NOT-NULL and then there would not be the line, that is not seen by [NOT] IN condition.


Now, let re-phrase this into flat query:

SELECT continent, population FROM world
    WHERE NOT (population > 100)

CONTINENT   POPULATION
<NULL>      100
1           10

SELECT continent, population FROM world
    WHERE population > 100

CONTINENT   POPULATION
2           200
4           110

This time the missed row was the one having NULL for Population column.


Then FreshPrinceOfSO suggested using EXISTS clause. While potentially it may end with most slow (non-effective) query plan, it at least masks away the special meaning of NULL value in SQL.

SELECT continent, population FROM world w_ext
WHERE EXISTS (
   SELECT continent FROM world w_int
   WHERE (w_int.population > 100) and (w_int.continent = w_ext.continent)
)

CONTINENT   POPULATION
2   200
4   110

SELECT continent, population FROM world w_ext
WHERE NOT EXISTS (
   SELECT continent FROM world w_int
   WHERE (w_int.population > 100) and (w_int.continent = w_ext.continent)
)

CONTINENT   POPULATION
<NULL>  100
1   10
3   <NULL>

Upvotes: 0

Julian
Julian

Reputation: 112

If I'm reading this correctly, the question asks to list every country in a continent where every country has a population below 25000000, correct?

If yes, look at your sub query:

SELECT continent FROM world
WHERE population > 25000000

You are pulling every continent that has at least one country w/ population over 25000000, so excluding those is why it works.

Example: Continent Alpha has 5 countries, four of them are small, but one of them, country Charlie has a population of 50000000.

So your sub query will return Continent Alpha because country Charlie fit the constraint of population > 25000000. This sub query will find everything that you don't want, that's why using the not in will work.

On the other hand:

SELECT continent FROM world
WHERE population > 25000000

If ANY country is below 25000000, it will display the continent, which is not what you want, because you want EVERY country to be below.

Example: Continent Alpha from before, the four small countries. Those four are below 25000000, so they will be returned by your sub query, regardless of the fact that Country Charlie has 50000000.

Obviously, this is not the best way to go about it, but this is why the first query worked, and the second did not.

Upvotes: 3

hol
hol

Reputation: 8423

Because every other continent has at least one country with less then 25 Mio population. That is what this says.

  SELECT name, continent, population FROM world 
WHERE continent IN (
    SELECT continent FROM world
    WHERE population < 25000000)

Translating it into words: From the list of all countries (in table world) please find all countries where the continent has a country that has less than 25 Mio population.

Upvotes: 2

KM.
KM.

Reputation: 103587

why use a sub query?

try using:

SELECT name, continent, population FROM world 
WHERE population > 25000000

and/or

SELECT name, continent, population FROM world 
WHERE population <= 25000000

the column of your condition: "population" is in the FROM table: "world". There is no need to use a sub query of the same table "world" again, just use the "population" column directly in the WHERE

or are you trying to do this:

SELECT name, continent, population FROM world 
WHERE continent NOT IN (
    SELECT continent FROM world
    GROUP BY continent 
    HAVING SUM(population) > 25000000)

notice the: SUM(), GROUP BY, and HAVING

Upvotes: 0

Related Questions