Reputation: 29209
I have the following file name changing log table.
ChangeNameLog(Date, OldName, NewName)
Primary key: Date, OldName
The data of the table looks like
Date OldName NewName 1/1 aaa bbb 1/2 bbb ccc 1/3 ccc bbb 1/4 bbb ddd 2/1 xx yy 2/2 yy zz
(The file name aaa was changed to bbb, and then to ccc, bbb, ddd later
The file name xx was changed to yy, and then to zz)
I want to get the oldest name for all the new names. The result will look like
Date NewName OldestName 1/2 bbb aaa 1/3 ccc aaa 1/4 ddd aaa 2/1 yy xx 2/2 zz xx
Is anyway to write a Transact-SQL (Version 2008 is fine) without using cursor to loop the logging table?
The following SQL can be used to prepare the data.
declare @log table (
Date Date, OldName varchar(20), NewName varchar(20) not null
primary key (Date, OldName)
);
-- The real table also have the following CK
-- create unique index IX_CK on @log (Date, NewName)
insert into @log values
('2012-01-01', 'aaa', 'bbb')
,('2012-01-02', 'bbb', 'ccc')
,('2012-01-03', 'ccc', 'bbb')
,('2012-01-04', 'bbb', 'ddd')
,('2012-01-05', 'ddd', 'eee')
,('2012-01-03', 'xx', 'yy')
,('2012-02-02', 'yy', 'zz')
,('2012-02-03', 'zz', 'xx')
;
Upvotes: 1
Views: 182
Reputation: 29209
My own solution:
declare @log table (
Date Date, OldName varchar(20), NewName varchar(20)
primary key (Date, OldName)
);
insert into @log values
('2012-01-01', 'aaa', 'bbb')
,('2012-01-02', 'bbb', 'ccc')
,('2012-01-03', 'ccc', 'bbb')
,('2012-01-04', 'bbb', 'ddd')
,('2012-01-05', 'ddd', 'eee')
,('2012-01-03', 'xx', 'yy')
,('2012-02-02', 'yy', 'zz')
,('2012-02-03', 'zz', 'xx')
;
;with m as (
select Date, OldName, NewName, 1 as L
from @log
union all
select l.Date, m.OldName, l.NewName, L + 1
from @log l join m on l.Date > m.Date and l.OldName = m.NewName
)
select *
from m
where L = (select MAX(l) from m m1 where NewName = m.NewName and Date = m.Date)
order by 1
Output:
The following result shows that the two original names are aaa and xx.
Date Orig Name L 2012-01-01 aaa bbb 1 2012-01-02 aaa ccc 2 2012-01-03 aaa bbb 3 2012-01-03 xx yy 1 2012-01-04 aaa ddd 4 2012-01-05 aaa eee 5 2012-02-02 xx zz 2 2012-02-03 xx xx 3
Upvotes: 0
Reputation: 15852
Or, for a different headache that plods through forwards:
declare @Helga as table ( Date datetime, OldName varchar(10), NewName varchar(10) )
insert into @Helga ( Date, OldName, NewName ) values
( '1/1/12', 'aaa', 'bbb' ), ( '1/2/12', 'bbb', 'ccc' ), ( '1/3/12', 'ccc', 'bbb' ),
( '1/4/12', 'bbb', 'ddd' ), ( '2/1/12', 'xx', 'yy' ), ( '2/2/12', 'yy', 'zz' )
select * from @Helga
; with Edmund as
( -- Get the oldest names.
select L.Date, L.OldName, L.NewName, L.OldName as Methuselah, cast( 0 as bigint ) as Ethyl
from @Helga as L left outer join
@Helga as R on R.NewName = L.OldName
where R.NewName is NULL
union all
-- Add newer names one generation at a time.
select H.Date, H.OldName, H.NewName, H.Methuselah, H.Sandy
from ( select iH.Date, iH.OldName, iH.NewName, Ed.Methuselah, Row_Number() over ( order by iH.Date ) as Sandy
from Edmund as Ed cross join
@Helga as iH where iH.OldName = Ed.NewName and iH.Date > Ed.Date ) as H
where H.Sandy = 1
)
select Date, OldName, NewName, Methuselah
from Edmund
order by Methuselah, Date
Of course, this would be easier and more reliable if you assigned a consistent identification to each file that you persist across name changes. When you have NY > NJ > MA > CA crossing paths with MN > MA > CA > AL all bets are off. If the first sequence carried FileId
1 and the second was all 2 you could still sort out the details.
Upvotes: 0
Reputation: 19356
Setup:
declare @logtable table (Date date, OldName nvarchar(200), NewName varchar(200))
insert into @logtable values (convert (date, '1/1/12', 1), 'aaa', 'bbb')
insert into @logtable values (convert (date, '1/2/12', 1), 'bbb', 'ccc')
insert into @logtable values (convert (date, '1/3/12', 1), 'ccc', 'bbb')
insert into @logtable values (convert (date, '1/4/12', 1), 'bbb', 'ddd')
insert into @logtable values (convert (date, '2/1/12', 1), 'xx', 'yy')
insert into @logtable values (convert (date, '2/2/12', 1), 'yy', 'zz')
Now on to recursive CTE. First part (backtrack) recurses over log table matching past names and keeping information on chain top (EndName). Second part, starters, assigns row numbers to EndName by changedates, and finally only the oldest records are displayed. This part might be expressed in more ways, using not exist on changedate, or keeping original name in every log entry, but i would investigate another approach only if this code proves to be too slow.
; with backtrack as (
select NewName EndName, NewName, OldName, Date
from @logtable
union all
select EndName, [@logtable].NewName, [@logtable].OldName, [@logtable].Date
from @logtable inner join backtrack
on [@logtable].NewName = backtrack.OldName
and [@logtable].Date < backtrack.Date
),
starters as (
select EndName NewName, OldName, Date, ROW_NUMBER() over (partition by EndName order by Date) RowNumber
from backtrack
)
select NewName, OldName
from starters
where RowNumber = 1
I hope that this will help you.
Upvotes: 3