ca9163d9
ca9163d9

Reputation: 29209

Get the oldest name (recursively)?

I have the following file name changing log table.

ChangeNameLog(Date, OldName, NewName)
Primary key: Date, OldName

The data of the table looks like

Date OldName NewName 
1/1  aaa     bbb
1/2  bbb     ccc
1/3  ccc     bbb
1/4  bbb     ddd
2/1  xx      yy
2/2  yy      zz

(The file name aaa was changed to bbb, and then to ccc, bbb, ddd later
The file name xx was changed to yy, and then to zz)

I want to get the oldest name for all the new names. The result will look like

Date NewName OldestName
1/2  bbb     aaa
1/3  ccc     aaa
1/4  ddd     aaa
2/1  yy      xx
2/2  zz      xx

Is anyway to write a Transact-SQL (Version 2008 is fine) without using cursor to loop the logging table?

The following SQL can be used to prepare the data.

declare @log table (
    Date Date, OldName varchar(20), NewName varchar(20) not null 
    primary key (Date, OldName)
);
-- The real table also have the following CK
-- create unique index IX_CK on @log (Date, NewName)

insert into @log values
 ('2012-01-01', 'aaa', 'bbb')
,('2012-01-02', 'bbb', 'ccc')
,('2012-01-03', 'ccc', 'bbb')
,('2012-01-04', 'bbb', 'ddd')
,('2012-01-05', 'ddd', 'eee')

,('2012-01-03', 'xx',  'yy')
,('2012-02-02', 'yy',  'zz')
,('2012-02-03', 'zz',  'xx')
;

Upvotes: 1

Views: 182

Answers (3)

ca9163d9
ca9163d9

Reputation: 29209

My own solution:

declare @log table (
    Date Date, OldName varchar(20), NewName varchar(20) 
    primary key (Date, OldName)
);

insert into @log values
 ('2012-01-01', 'aaa', 'bbb')
,('2012-01-02', 'bbb', 'ccc')
,('2012-01-03', 'ccc', 'bbb')
,('2012-01-04', 'bbb', 'ddd')
,('2012-01-05', 'ddd', 'eee')

,('2012-01-03', 'xx',  'yy')
,('2012-02-02', 'yy',  'zz')
,('2012-02-03', 'zz',  'xx')
;

;with m as (
    select Date, OldName, NewName, 1 as L
    from @log 
    union all
    select l.Date, m.OldName, l.NewName, L + 1
    from @log l join m on l.Date > m.Date and l.OldName = m.NewName
)
select * 
from m
where L = (select MAX(l) from m m1 where NewName = m.NewName and Date = m.Date)
order by 1

Output:
The following result shows that the two original names are aaa and xx.

Date       Orig Name L
2012-01-01  aaa bbb 1
2012-01-02  aaa ccc 2
2012-01-03  aaa bbb 3
2012-01-03  xx  yy  1
2012-01-04  aaa ddd 4
2012-01-05  aaa eee 5
2012-02-02  xx  zz  2
2012-02-03  xx  xx  3

Upvotes: 0

HABO
HABO

Reputation: 15852

Or, for a different headache that plods through forwards:

declare @Helga as table ( Date datetime, OldName varchar(10), NewName varchar(10) )
insert into @Helga ( Date, OldName, NewName ) values
  ( '1/1/12', 'aaa', 'bbb' ), ( '1/2/12', 'bbb', 'ccc' ), ( '1/3/12', 'ccc', 'bbb' ),
  ( '1/4/12', 'bbb', 'ddd' ), ( '2/1/12', 'xx', 'yy' ), ( '2/2/12', 'yy', 'zz' )
select * from @Helga

; with Edmund as
( -- Get the oldest names.
  select L.Date, L.OldName, L.NewName, L.OldName as Methuselah, cast( 0 as bigint ) as Ethyl
    from @Helga as L left outer join
      @Helga as R on R.NewName = L.OldName
    where R.NewName is NULL
  union all
  -- Add newer names one generation at a time.
  select H.Date, H.OldName, H.NewName, H.Methuselah, H.Sandy
    from ( select iH.Date, iH.OldName, iH.NewName, Ed.Methuselah, Row_Number() over ( order by iH.Date ) as Sandy
      from Edmund as Ed cross join
        @Helga as iH where iH.OldName = Ed.NewName and iH.Date > Ed.Date ) as H
    where H.Sandy = 1
)
select Date, OldName, NewName, Methuselah
  from Edmund
  order by Methuselah, Date

Of course, this would be easier and more reliable if you assigned a consistent identification to each file that you persist across name changes. When you have NY > NJ > MA > CA crossing paths with MN > MA > CA > AL all bets are off. If the first sequence carried FileId 1 and the second was all 2 you could still sort out the details.

Upvotes: 0

Nikola Markovinović
Nikola Markovinović

Reputation: 19356

Setup:

declare @logtable table (Date date, OldName nvarchar(200), NewName varchar(200))

insert into @logtable values (convert (date, '1/1/12', 1), 'aaa', 'bbb')
insert into @logtable values (convert (date, '1/2/12', 1), 'bbb', 'ccc')
insert into @logtable values (convert (date, '1/3/12', 1), 'ccc', 'bbb')
insert into @logtable values (convert (date, '1/4/12', 1), 'bbb', 'ddd')
insert into @logtable values (convert (date, '2/1/12', 1), 'xx', 'yy')
insert into @logtable values (convert (date, '2/2/12', 1), 'yy', 'zz')

Now on to recursive CTE. First part (backtrack) recurses over log table matching past names and keeping information on chain top (EndName). Second part, starters, assigns row numbers to EndName by changedates, and finally only the oldest records are displayed. This part might be expressed in more ways, using not exist on changedate, or keeping original name in every log entry, but i would investigate another approach only if this code proves to be too slow.

; with backtrack as (
    select NewName EndName, NewName, OldName, Date
        from @logtable
    union all
    select EndName, [@logtable].NewName, [@logtable].OldName, [@logtable].Date
    from @logtable inner join backtrack
        on [@logtable].NewName = backtrack.OldName
        and [@logtable].Date < backtrack.Date
),
starters as (
    select EndName NewName, OldName, Date, ROW_NUMBER() over (partition by EndName order by Date) RowNumber
    from backtrack
)
select NewName, OldName
from starters
where RowNumber = 1

I hope that this will help you.

Upvotes: 3

Related Questions