xuT
xuT

Reputation: 318

Query to find all couples of countries separated by another nation

Thanks in advance for your time

I have the following table from the mondial database (Website, Documentation).

CREATE TABLE borders
(Country1 VARCHAR(4),
 Country2 VARCHAR(4),
 Length FLOAT, 
 CONSTRAINT CHECK (Length > 0),
 CONSTRAINT BorderKey PRIMARY KEY (Country1,Country2));

The table doesn't contain reciprocal values (it only contains Country1, Country2 or Country2, Country1 to define a border).

I need to make a query that outputs all couples of nations that are not neighbors but are neighbors to neighbors (all couples of nations that are separated by another nation).

Country1 and Country2 contain the nations codes like "F" for France, "I" for Italy and so on. Here is an example row of the output:

RSM || F

RSM is the code for the Republic of San Marino (San Marino is an enclave entirely surrounded by Italy). RSM is not a neighbor to France obviously but Italy is, so the output contains the couple RSM, F and it also contains a similar couple for all the other nations that are neighbors to Italy.

I have spent several hours trying to figure out a solution but I'm far from reaching a solution and I already have many questions, here's what I did:

I started by making a query to find out all neighbors to a certain country.

SELECT Country1
FROM borders
WHERE Country2 = "RSM"
UNION
SELECT Country2
FROM borders
WHERE Country1 = "RSM"

This obviously only outputs "I" which is the code for Italy (so it's correct). I already have a question: is there a better way to do this or is this ok?

Then I took it to the next step and made a query to find all the neighbors to the neighbors previously found like this:

SELECT Country1
FROM borders
WHERE Country2 IN (
    SELECT Country1
    FROM borders
    WHERE Country2 = "RSM"
    UNION
    SELECT Country2
    FROM borders
    WHERE Country1 = "RSM"
)
UNION
SELECT Country2
FROM borders
WHERE Country1 IN (
    SELECT Country1
    FROM borders
    WHERE Country2 = "RSM"
    UNION
    SELECT Country2
    FROM borders
    WHERE Country1 = "RSM"
)

So I do basically the same thing but instead of looking for the neighbors of a specific country I search for all the neighbors of a certain group of countries. The query is not finished because I need to remove the starting country and all its neighbors from the output so the query becomes this:

SELECT "RSM", Country1 
FROM (
    SELECT Country1
    FROM borders
    WHERE Country2 IN (
        SELECT Country1
        FROM borders
        WHERE Country2 = "RSM"
        UNION
        SELECT Country2
        FROM borders
        WHERE Country1 = "RSM"
    )
    UNION
    SELECT Country2
    FROM borders
    WHERE Country1 IN (
        SELECT Country1
        FROM borders
        WHERE Country2 = "RSM"
        UNION
        SELECT Country2
        FROM borders
        WHERE Country1 = "RSM"
    )
) tmp
WHERE tmp.Country1 NOT IN (
    SELECT Country1
    FROM borders
    WHERE Country2 = "RSM"
    UNION
    SELECT Country2
    FROM borders
    WHERE Country1 = "RSM"
) AND tmp.Country1 != "RSM"

This query works for a single starting country (RSM) but I need to output all the distinct (meaning no F, RSM couple in the output if RSM, F is present) couples like stated above and I also think that the query I developed is really bad because it keeps repeating the same query over and over. I have already searched ways to avoid this and I found interesting solutions such as the WITH tmp AS query clause but I'm using MySQL and the WITH AS clause is not supported.

So what do you experts think? Is this at least near the right way to go? Am I missing something obvious?

Thanks in advance for your time and sorry if this terrible query gave you a headache.

EDIT 1: I made a SQL Fiddle containing the table, the data and my query in case you want to easily be able to run queries. I hope it'll be helpful to somebody.

Upvotes: 3

Views: 1644

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1270011

Assuming that the data has reciprocal values, meaning both ('France', 'Italy') and ('Italy', 'France') are in the data, then you can do this basically with a join and filtering:

select b1.country1, b2.country2
from borders b1 join
     borders b2
     on b1.country2 = b2.country1
where not exists (select 1
                  from borders b
                  where b.country1 = b1.country1 and b.country2 = b2.country2
                 );

EDIT: Without reciprocal values, I would just create a view and use the view for the query:

create view v_borders as
    select country1, country2 from borders union all
    select country2, country1 from borders;

And then use the view in the query. Also you can do this within the query, it is just messy because MySQL does not support common table expressions (CTEs).

Upvotes: 1

xuT
xuT

Reputation: 318

User jpw solved the problem in a comment.

Many users suggested to make a view of the table in order to make the couples or nations symmetrical like this:

CREATE VIEW borders_symmetrical AS
SELECT Country1, Country2 FROM borders
UNION ALL
SELECT Country2, Country1 FROM borders;

That indeed is a great suggestion in my opinion and it shows how views can be really useful. Now the query becomes much much easier:

SELECT DISTINCT b1.Country1, b2.Country2 
FROM borders_symmetrical b2
JOIN borders_symmetrical b1
ON b2.Country1 = b1.Country2 
WHERE b2.Country2 <> b1.Country1
    #AND (b1.Country1 = 'RSM' OR b2.Country2 = 'RSM') # Debug a single nation
    AND b2.country2 NOT IN (
        SELECT Country2
        FROM borders_symmetrical 
        WHERE Country1 = b1.Country1
    ) AND b1.Country1 < b2.Country2

I have reposted this because jpw didn't submit and answer but just commented on the post.

Thanks again to everyone for your precious time, you helped me a lot and made me understand a couple of important things about SQL and Databases.

Upvotes: 1

Tim Meyer
Tim Meyer

Reputation: 12600

This is the shortest I could come up with, without modifying the database you are selecting from:

SELECT DISTINCT reciprocalBorders1.first, reciprocalBorders2.second, reciprocalBorders1.second
FROM   (SELECT Country1 first, Country2 second FROM borders UNION
        SELECT Country2 first, Country1 second FROM borders) reciprocalBorders1,
       (SELECT Country1 first, Country2 second FROM borders UNION
        SELECT Country2 first, Country1 second FROM borders) reciprocalBorders2
WHERE reciprocalBorders1.second = reciprocalBorders2.first
  AND reciprocalBorders1.first < reciprocalBorders2.first
  AND reciprocalBorders1.first <> reciprocalBorders2.second
  AND NOT EXISTS(
    SELECT 1
    FROM   (SELECT Country1 first, Country2 second FROM borders UNION
            SELECT Country2 first, Country1 second FROM borders) reciprocalBorders3
    WHERE   reciprocalBorders3.first = reciprocalBorders1.first
      AND   reciprocalBorders3.second = reciprocalBorders2.second
  )
ORDER BY 1, 2, 3;

The third column contains the country which separates the countries in the first and second column. Obviously the FROM parts could be shortened by using a view. Each FROM part adds the reversed table to itself in order to have 'A|F' and 'F|A' entries in one table.

Concerning the WHERE part:

  1. The first line joins the two reciprocal tables so that pairs like 'A|CH' and 'CH|F' are connected together.
  2. The second line makes sure that if the combination 'A|CH', 'CH|F' is analysed, 'F|CH', 'CH|A' is not analysed
  3. The third line makes sure that you don't receive results like 'A|CH', 'CH|A'
  4. The fourth line makes sure that you don't receive results which have a direct border.

The SELECT DISTINCT makes sure you don't get double results for countries A and B which each share a border with two or more of the same countries, e.g. Austria is connected to Suisse and Germany, both of which share a border to France, but you don't want two Austria, France entries

Upvotes: 0

Related Questions