Michael S.
Michael S.

Reputation: 1

How to find the predecessor of the predecessor

I am trying to use an SQL-query to find the predecessor municipality of a municipality or, if available, other predecessor municipalities.

Here is my input table (MUNICIPALMERGE):

municipality_id valid_year_from valid_year_to predecessor_municipality_id
1000 1990 1995 NULL
1001 1996 2000 1000
1002 2001 2005 1001
1003 1990 2005 NULL

My goal is a table showing which municipality was valid in the corresponding year. For the example above, the correct table would be as follows:

year municipality_id valid_municipality_id
1990 1000 1000
1991 1000 1000
1992 1000 1000
1993 1000 1000
1994 1000 1000
1995 1000 1000
1996 1000 1001
1997 1000 1001
1998 1000 1001
1999 1000 1001
2000 1000 1001
2001 1000 1002
2002 1000 1002
2003 1000 1002
2004 1000 1002
2005 1000 1002
1990 1001 1000
1991 1001 1000
1992 1001 1000
1993 1001 1000
1994 1001 1000
1995 1001 1000
1996 1001 1001
1997 1001 1001
1998 1001 1001
1999 1001 1001
2000 1001 1001
2001 1001 1002
2002 1001 1002
2003 1001 1002
2004 1001 1002
2005 1001 1002
1990 1002 1000
1991 1002 1000
1992 1002 1000
1993 1002 1000
1994 1002 1000
1995 1002 1000
1996 1002 1001
1997 1002 1001
1998 1002 1001
1999 1002 1001
2000 1002 1001
2001 1002 1002
2002 1002 1002
2003 1002 1002
2004 1002 1002
2005 1002 1002
1990 1003 1003
1991 1003 1003
1992 1003 1003
1993 1003 1003
1994 1003 1003
1995 1003 1003
1996 1003 1003
1997 1003 1003
1998 1003 1003
1999 1003 1003
2000 1003 1003
2001 1003 1003
2002 1003 1003
2003 1003 1003
2004 1003 1003
2005 1003 1003

This means that the respective predecessor or successor municipalities, if any, should be found for each municipality.

Is this somehow possible and if so, how?

Thank you very much for your help.

I tried the Query below, but got some null values in the Valid_minicipality_id column.

WITH
  YEARS AS
  (
    SELECT
      1990 AS YEAR
    UNION ALL
    SELECT
      YEAR+1
    FROM
      YEARS
    WHERE
      YEAR+1<=2005--YEAR(GETDATE())
  )
  --SELECT * FROM YEARS;
, YEARS_MUNICIPALITY AS
  (
    SELECT
      Y.YEAR
    , MM.MUNICIPALITY_ID
    , ISNULL(MM.PREDECESSOR_MUNICIPALITY_ID, MM.MUNICIPALITY_ID) AS PREDECESSOR_MUNICIPALITY_ID
    FROM
      YEARS Y
    INNER JOIN
      MUNICIPALMERGE MM
    ON
      Y.YEAR BETWEEN MM.VALID_YEAR_FROM AND MM.VALID_YEAR_TO
  )
  --SELECT * FROM YEARS_MUNICIPALITY;
, ALL_YEARS_MUNICIPALITY AS
  (
    SELECT
      Y.YEAR
    , MM.MUNICIPALITY_ID
    , MM.PREDECESSOR_MUNICIPALITY_ID
    FROM
      YEARS Y
    CROSS JOIN
      MUNICIPALMERGE MM
  )
--SELECT * FROM ALL_YEARS_MUNICIPALITY;
SELECT
  AYM.YEAR
, AYM.MUNICIPALITY_ID
, YM.MUNICIPALITY_ID AS VALID_MUNICIPALITY_ID
FROM
  ALL_YEARS_MUNICIPALITY AYM
LEFT JOIN
  YEARS_MUNICIPALITY YM
ON
  AYM.YEAR=YM.YEAR
AND AYM.MUNICIPALITY_ID=YM.PREDECESSOR_MUNICIPALITY_ID
ORDER BY
  AYM.MUNICIPALITY_ID
, AYM.YEAR;

Upvotes: 0

Views: 79

Answers (2)

ValNik
ValNik

Reputation: 5916

I propose to consider an example where we first build a complete table of predecessors and followers by going down and up the tree.
The query covers all the years for each municipality_id for which there is data in the table. We don't extract these years specifically.
To avoid repetitions, we use the accumulated path.

The result of traversing the tree at the end of the text.

To extend the result for each year, you can use generate_series or a recursive query.
Both options seem quite acceptable to me.

See examples:

Traversing tree and expand with generate_series

with r as(
  select municipality_id, municipality_id valid_municipality_id
     ,valid_year_from,valid_year_to
     ,predecessor_municipality_id
     ,cast(concat('-',municipality_id,'-') as varchar(1000)) path
  from municipalmerge
  union all
  select r.municipality_id, t.municipality_id valid_municipality_id
     ,t.valid_year_from,t.valid_year_to
     ,t.predecessor_municipality_id
     ,cast(concat(path,t.municipality_id,'-') as varchar(1000)) path
  from r inner join municipalmerge t 
    on  t.municipality_id=r.predecessor_municipality_id
     or t.predecessor_municipality_id=r.valid_municipality_id
  where charindex(concat('-',t.municipality_id,'-'),path)=0
)
select value as year,municipality_id,valid_municipality_id 
from r 
cross apply generate_series(r.valid_year_from,r.valid_year_to) t
order by municipality_id,value 

Example with 2 recursive queries.

with r as(
  select municipality_id, municipality_id valid_municipality_id
     ,valid_year_from,valid_year_to
     ,predecessor_municipality_id
     ,cast(concat('-',municipality_id,'-') as varchar(1000)) path
  from municipalmerge
  union all
  select r.municipality_id, t.municipality_id valid_municipality_id
     ,t.valid_year_from,t.valid_year_to
     ,t.predecessor_municipality_id
     ,cast(concat(path,t.municipality_id,'-') as varchar(1000)) path
  from r inner join municipalmerge t 
    on  t.municipality_id=r.predecessor_municipality_id
     or t.predecessor_municipality_id=r.valid_municipality_id
  where charindex(concat('-',t.municipality_id,'-'),path)=0
)
,allYears as(
  select municipality_id,valid_municipality_id
     ,valid_year_from year
     ,valid_year_to
  from r
  union all
  select municipality_id,valid_municipality_id
     ,year+1 year
     ,valid_year_to
  from allYears
  where (year+1)<=valid_year_to
  
)
select year,municipality_id,valid_municipality_id 
from allyears 
order by municipality_id,year  

Details:
First recursion, before expanding for every year. Recursion depth is equal max number of merges for municipality_id.

with r as(  
  -- in anchor we take municipality_id and use id as root thru recursion
  select municipality_id, municipality_id valid_municipality_id
     ,valid_year_from,valid_year_to
     ,predecessor_municipality_id
     ,cast(concat('-',municipality_id,'-') as varchar(1000)) path
     ,1 lvl -- for debug only
  from municipalmerge
  union all
  select r.municipality_id, t.municipality_id valid_municipality_id
     ,t.valid_year_from,t.valid_year_to
     ,t.predecessor_municipality_id
         -- accumulate path
     ,cast(concat(path,t.municipality_id,'-') as varchar(1000)) path
     ,lvl+1 lvl
  from r inner join municipalmerge t  -- join up and down
    on  t.municipality_id=r.predecessor_municipality_id
     or t.predecessor_municipality_id=r.valid_municipality_id
     -- exclude traversed points
  where charindex(concat('-',t.municipality_id,'-'),path)=0
)
select * from r order by municipality_id,valid_year_from 

municipality_id valid_municipality_id valid_year_from valid_year_to predecessor_municipality_id path lvl
1000 1000 1990 1995 null -1000- 1
1000 1001 1996 2000 1000 -1000-1001- 2
1000 1002 2001 2005 1001 -1000-1001-1002- 3
1001 1000 1990 1995 null -1001-1000- 2
1001 1001 1996 2000 1000 -1001- 1
1001 1002 2001 2005 1001 -1001-1002- 2
1002 1000 1990 1995 null -1002-1001-1000- 3
1002 1001 1996 2000 1000 -1002-1001- 2
1002 1002 2001 2005 1001 -1002- 1
1003 1003 1990 2005 null -1003- 1

fiddle

Expanded output

year municipality_id valid_municipality_id
1990 1000 1000
1991 1000 1000
1992 1000 1000
1993 1000 1000
1994 1000 1000
1995 1000 1000
1996 1000 1001
1997 1000 1001
1998 1000 1001
1999 1000 1001
2000 1000 1001
2001 1000 1002
2002 1000 1002
2003 1000 1002
2004 1000 1002
2005 1000 1002
1990 1001 1000
1991 1001 1000
1992 1001 1000
1993 1001 1000
1994 1001 1000
1995 1001 1000
1996 1001 1001
1997 1001 1001
1998 1001 1001
1999 1001 1001
2000 1001 1001
2001 1001 1002
2002 1001 1002
2003 1001 1002
2004 1001 1002
2005 1001 1002
1990 1002 1000
1991 1002 1000
1992 1002 1000
1993 1002 1000
1994 1002 1000
1995 1002 1000
1996 1002 1001
1997 1002 1001
1998 1002 1001
1999 1002 1001
2000 1002 1001
2001 1002 1002
2002 1002 1002
2003 1002 1002
2004 1002 1002
2005 1002 1002
1990 1003 1003
1991 1003 1003
1992 1003 1003
1993 1003 1003
1994 1003 1003
1995 1003 1003
1996 1003 1003
1997 1003 1003
1998 1003 1003
1999 1003 1003
2000 1003 1003
2001 1003 1003
2002 1003 1003
2003 1003 1003
2004 1003 1003
2005 1003 1003

Upvotes: 0

Thom A
Thom A

Reputation: 95924

I'm not really sure this is the most performant way of doing it, but it did get there.

I use GENERATE_SERIES to get all the calendar years, not an rCTE. If you aren't on SQL Server 2022+, then use a tally. For a small amount of data, an rCTE is "fine" for generating the rows, but it gets slow fast so I avoid them for generating such data in all scenarios and use a set-based method.

I then do use an rCTE to transverse the hierarchy of your data. I then get use a further CTE put the values on both sides (so you have 1000,1001 and 1001,1000) and get the DISTINCT results.

Finally, I then JOIN that to your original data, based on the value of the "valid" municipality. This results in the following:

CREATE TABLE dbo.YourTable (municipality_id int,
                            valid_year_from int,
                            valid_year_to int,
                            predecessor_municipality_id int);
GO

INSERT INTO dbo.YourTable (municipality_id,
                           valid_year_from,
                           valid_year_to,
                           predecessor_municipality_id)
VALUES (1000, 1990, 1995, NULL),
       (1001, 1996, 2000, 1000),
       (1002, 2001, 2005, 1001),
       (1003, 1990, 2005, NULL);
GO
WITH MinMaxYears AS(
    SELECT MIN(valid_year_from) AS MinYear,
           MAX(valid_year_to) AS MaxYear
    FROM dbo.YourTable),
Years AS(
    SELECT GS.value AS CalendarYear
    FROM MinMaxYears MMY
         CROSS APPLY GENERATE_SERIES(MMY.MinYear, MMY.MaxYear, 1) GS),
Municipalities AS(
    SELECT YT.municipality_id,
           YT.municipality_id AS Current_municipality_id,
           YT.predecessor_municipality_id
    FROM dbo.YourTable YT
    UNION ALL
    SELECT M.municipality_id,
           YT.municipality_id AS Current_municipality_id,
           YT.predecessor_municipality_id
    FROM dbo.YourTable YT
         JOIN Municipalities M ON YT.municipality_id = M.predecessor_municipality_id),
ValidMunicipalities AS(
    SELECT DISTINCT V.municipality_id,
                    V.valid_municipality_id
    FROM Municipalities M
         CROSS APPLY (VALUES(M.municipality_id, M.Current_municipality_id),
                            (M.Current_municipality_id, M.municipality_id))V(municipality_id,valid_municipality_id))
SELECT Y.CalendarYear,
       VM.municipality_id,
       VM.valid_municipality_id
FROM ValidMunicipalities VM
     CROSS JOIN Years Y
     JOIN dbo.YourTable YT ON Y.CalendarYear BETWEEN YT.valid_year_from AND YT.valid_year_to
                          AND VM.valid_municipality_id = YT.municipality_id
ORDER BY VM.municipality_id,
         VM.valid_municipality_id,
         Y.CalendarYear;
GO
DROP TABLE dbo.YourTable;

Upvotes: 0

Related Questions