Reputation: 129
I have a table with the columns Age
, Period
and Year
. The column Age
always starts with 0 and doesn't have a fixed maximum value (I used 'Age' 0 to 30 in this example but the range could also be 0 to 100 etc.), the values Period
and Year
only appear in certain rows at certain ages.
However at what Age
the values for Period
and Year
appear, changes and the solution should therefore be dynamic. What is the best way to fill in the NULL
values with correct Period
and Year
?
I am using SQL Server.
Age Period Year
-----------------
0 NULL NULL
1 NULL NULL
2 NULL NULL
3 NULL NULL
4 NULL NULL
5 NULL NULL
6 NULL NULL
7 NULL NULL
8 NULL NULL
9 NULL NULL
10 NULL NULL
11 NULL NULL
12 NULL NULL
13 NULL NULL
14 NULL NULL
15 NULL NULL
16 NULL NULL
17 NULL NULL
18 NULL NULL
19 NULL NULL
20 NULL NULL
21 46 2065
22 NULL NULL
23 NULL NULL
24 NULL NULL
25 NULL NULL
26 51 2070
27 NULL NULL
28 NULL NULL
29 NULL NULL
30 NULL NULL
The result should look like this, the numbers for Period
and Year
should be increased and/or decrease from the last known values for Period
and Year
.
Age Period Year
-----------------
0 25 2044
1 26 2045
2 27 2046
3 28 2047
4 29 2048
5 30 2049
6 31 2050
7 32 2051
8 33 2052
9 34 2053
10 35 2054
11 36 2055
12 37 2056
13 38 2057
14 39 2058
15 40 2059
16 41 2060
17 42 2061
18 43 2062
19 44 2063
20 45 2064
21 46 2065
22 47 2066
23 48 2067
24 49 2068
25 50 2069
26 51 2070
27 52 2071
28 53 2072
29 54 2073
30 55 2074
Here is an UPDATE to my question as I didn't specify my requirement detailed enough:
The solution should be able to handle different combinations of Age
, Period
and Year
. My start point will always be a known Age
, Period
and Year
combination. However, the combination Age
= 21, Period
= 46 and Year
= 2065 (or 26|51|2070 as the second combination) in my example is not static. The value at Age
= 21 could be anything e.g. Period
= 2 and Year
= 2021. Whatever the combination (Age
, Period
, Year
) is, the solution should fill in the gaps and finish the sequence counting up and down from the known values for Period
and Year
. If a Period
value sequence becomes negative the solutions should return NULL
values, if possible.
Upvotes: 5
Views: 1261
Reputation: 111
-- hope you can manage the syntax error. but some logic like given below should work in this case where we can make period an origin to calculate other missing values. good luck!
declare @knownperiod int;
declare @knownperiodage int;
declare @agetop int;
declare @agebottom int;
@knownperiod = select top 1 period from table1 where period is not null
@knownperiodage = select top 1 age from table1 where period is not null
while(@knownperiodage >= 0)
begin
@knownperiod = @knownperiod -1 ;
@knownperiodage = @knownperiodage -1;
update table1 set period = @knownperiod, year = YEAR(GetDate())+@knownperiod-1 where age = @knownperiodage
end
-- now for bottom age
@knownperiod = select top 1 period from table1 where period is null or year is null
@knownperiodage = select top 1 age from table1 where period is null or year is null
while(@knownperiodage <= (Select max(age) from table1))
begin
@knownperiod = @knownperiod +1 ;
@knownperiodage = @knownperiodage +1;
update table1 set period = @knownperiod, year = YEAR(GetDate())+@knownperiod-1 where age = @knownperiodage
end
Upvotes: 1
Reputation: 60472
You finally want three sequences with different start values. Then you simply need to calculate an offset and add it to age
:
with cte as
(
select age
,max(period - age) over () + age as period -- adjusted period
,max(yr - age) over () + age as yr -- adjusted yr
from #yourtable
)
select age
-- If a Period value sequence becomes negative the solutions should return NULL
,case when period >0 then period end as period
,yr
from cte
See fiddle
Upvotes: 2
Reputation: 15977
Also you can use recursive CTE (it can handle any variation of data in the table except only one that has no populated period and year at all):
WITH cte AS ( -- get any filled period and year
SELECT TOP 1 period - age delta,
[year]-period start_year
FROM tablename
WHERE period is not null and [year] is not null
), seq AS ( --get min and max age values
SELECT MIN(age) as min_age, MAX(age) as max_age
FROM tablename
), go_recursive AS (
SELECT min_age age,
min_age+delta period ,
start_year+min_age+delta year,
max_age
FROM seq
CROSS JOIN cte --That will generate the initial first row
UNION ALL
SELECT age + 1,
period +1,
year + 1,
max_age
FROM go_recursive
WHERE age < max_age --This part increments the data from first row
)
SELECT age,
period,
[year]
FROM go_recursive
OPTION (MAXRECURSION 0)
-- If you know there are some limit of rows in that kind of tables
--use this row count instead 0
Upvotes: 0
Reputation: 222482
This reads like a gaps-and-islands problem, where "empty" rows are the gaps and non-empty rows are the islands.
You want to fill the gaps. Your question is a bit tricky, because you do not clearly describe how to proceed when a gap row has both preceding and following islands - and what to do if they are not consistent.
Let me assume that you want to derive the value from the following island if there is one available, and fall back of the precedng island.
Here is an approach using lateral joins to retrieve the next and preceding non-empty row:
select t.age,
coalesce(t.period, n.period - n.diff, p.period - p.diff) period,
coalesce(t.year, n.year - n.diff, p.year - p.diff) year
from mytable t
outer apply (
select top (1) t1.*, t1.age - t.age diff
from mytable t1
where t1.age > t.age and t1.period is not null and t1.year is not null
order by t1.age
) n
outer apply (
select top (1) t1.*, t1.age - t.age diff
from mytable t1
where t1.age < t.age and t1.period is not null and t1.year is not null
order by t1.age desc
) p
order by t.age
Actually, this would probably be more efficiently performed with window functions. We can implement the very same logic by building groups of records with window counts, then doing the computation within the groups:
select
age,
coalesce(
period,
max(period) over(partition by grp2) - max(age) over(partition by grp2) + age,
max(period) over(partition by grp1) - min(age) over(partition by grp1) + age
) period,
coalesce(
year,
max(year) over(partition by grp2) - max(age) over(partition by grp2) + age,
max(year) over(partition by grp1) - min(age) over(partition by grp1) + age
) year
from (
select t.*,
count(period) over(order by age) grp1,
count(period) over(order by age desc) grp2
from mytable t
) t
order by age
Demo on DB Fiddle - both queries yield:
age | period | year --: | -----: | ---: 0 | 25 | 2044 1 | 26 | 2045 2 | 27 | 2046 3 | 28 | 2047 4 | 29 | 2048 5 | 30 | 2049 6 | 31 | 2050 7 | 32 | 2051 8 | 33 | 2052 9 | 34 | 2053 10 | 35 | 2054 11 | 36 | 2055 12 | 37 | 2056 13 | 38 | 2057 14 | 39 | 2058 15 | 40 | 2059 16 | 41 | 2060 17 | 42 | 2061 18 | 43 | 2062 19 | 44 | 2063 20 | 45 | 2064 21 | 46 | 2065 22 | 47 | 2066 23 | 48 | 2067 24 | 49 | 2068 25 | 50 | 2069 26 | 51 | 2070 27 | 52 | 2071 28 | 53 | 2072 29 | 54 | 2073 30 | 55 | 2074
Upvotes: 0
Reputation: 6015
This solution takes 4 inputs:
For any combination of inputs this code generates the requested output. If either the Age or Year is calculated to be negative then it is converted to NULL. The current limit to the list length could be increased to whatever is necessary. The technique of creating a row_number using cross applied rows is known to be very fast when generating large sequences. Above about 500 rows it's always faster than a recursion based CTE. At small row numbers there's little to no performance difference between the two techniques.
Here are the code and output to match the example data.
Inputs
declare
@list_length int=31,
@start_age int=21,
@start_period int=46,
@start_year int=2065;
Code
with
n(n) as (select * from (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)) v(n)),
tally_cte(n) as (
select row_number() over (order by (select null))
from n n1 cross join n n2 cross join n n3 cross join n n4 cross join n n5)
select p.Age,
case when p.[Period]<0 then null else p.[Period] end [Period],
case when p.[Year]<0 then null else p.[Year] end [Year]
from tally_cte t
cross apply
(select (t.n-1) [Age], (t.n-1)+(@start_period-@start_age) [Period],
(t.n-1)+(@start_year-@start_age) [Year]) p
where n<=@list_length;
Output
Age Period Year
0 25 2044
1 26 2045
2 27 2046
3 28 2047
4 29 2048
5 30 2049
6 31 2050
7 32 2051
8 33 2052
9 34 2053
10 35 2054
11 36 2055
12 37 2056
13 38 2057
14 39 2058
15 40 2059
16 41 2060
17 42 2061
18 43 2062
19 44 2063
20 45 2064
21 46 2065
22 47 2066
23 48 2067
24 49 2068
25 50 2069
26 51 2070
27 52 2071
28 53 2072
29 54 2073
30 55 2074
Suppose both the Period and the Year are less than the start Age. When the calculated values are negative the value is replaced with a NULL.
Inputs
declare
@list_length int=100,
@start_age int=10,
@start_period int=5,
@start_year int=8;
Output
Age Period Year
0 NULL NULL
1 NULL NULL
2 NULL 0
3 NULL 1
4 NULL 2
5 0 3
6 1 4
7 2 5
8 3 6
9 4 7
10 5 8
11 6 9
12 7 10
...
99 94 97
Imo this is a flexible and efficient way to meet all of the requirements. Please let me know if there are any issues.
Upvotes: 0
Reputation: 4957
Tabel creation code
create table yourtable ( AGE int , Period int, Year int )
insert into yourtable
Select 0 AS AGE , null As Period , null As Year UNION all
Select 1 AS AGE , null As Period , null As Year UNION all
Select 2 AS AGE , null As Period , null As Year UNION all
Select 3 AS AGE , null As Period , null As Year UNION all
Select 4 AS AGE , null As Period , null As Year UNION all
Select 5 AS AGE , null As Period , null As Year UNION all
Select 6 AS AGE , null As Period , null As Year UNION all
Select 7 AS AGE , null As Period , null As Year UNION all
Select 8 AS AGE , null As Period , null As Year UNION all
Select 9 AS AGE , null As Period , null As Year UNION all
Select 10 AS AGE , null As Period , null As Year UNION all
Select 11 AS AGE , null As Period , null As Year UNION all
Select 12 AS AGE , null As Period , null As Year UNION all
Select 13 AS AGE , null As Period , null As Year UNION all
Select 14 AS AGE , null As Period , null As Year UNION all
Select 15 AS AGE , null As Period , null As Year UNION all
Select 16 AS AGE , null As Period , null As Year UNION all
Select 17 AS AGE , null As Period , null As Year UNION all
Select 18 AS AGE , null As Period , null As Year UNION all
Select 19 AS AGE , null As Period , null As Year UNION all
Select 20 AS AGE , null As Period , null As Year UNION all
Select 21 AS AGE ,46 As Period ,2065 As Year UNION all
Select 22 AS AGE , null As Period , null As Year UNION all
Select 23 AS AGE , null As Period , null As Year UNION all
Select 24 AS AGE , null As Period , null As Year UNION all
Select 25 AS AGE , 51 As Period ,2070 As Year UNION all
Select 26 AS AGE , null As Period , null As Year UNION all
Select 27 AS AGE , null As Period , null As Year UNION all
Select 28 AS AGE , null As Period , null As Year UNION all
Select 29 AS AGE , null As Period , null As Year UNION all
Select 30 AS AGE , null As Period , null As Year
**Steps **
Code to fix the serial
;with tmp as
(select top 1 * from yourtable where Period is not null and year is not null)
update yourtable
set Period = (tmp.Period - tmp.age) + yourtable.age
, year = (tmp.year - tmp.age) + yourtable.age
from yourtable , tmp
OR
Declare @age int ,@Year int ,@Period int
select @age = age , @Year = year - (age +1) ,@Period = Period- (AGE +1)
from yourtable where Period is not null and year is not null
update yourtable
set Period =@Period + age
,Year =@year + age
from yourtable
Upvotes: 2
Reputation: 6685
Is the process to first calculate the increments (age -> period and age -> year) then simply add those increments to the age values? This assumes the differences between age and period, and age and year, are consistent across rows (just not filled in sometimes).
As such, you could use the following to first calculate the increments (PeriodInc, YrInc) and then select the values with the increments added (noting that if period goes negative, it gets NULL).
; WITH PeriodInc AS (SELECT TOP 1 Period - Age AS PeriodInc FROM #yourtable WHERE Period IS NOT NULL),
YrInc AS (SELECT TOP 1 Yr - Age AS YrInc FROM #yourtable WHERE Yr IS NOT NULL)
SELECT Age,
CASE WHEN (Age + PeriodInc) >= 0 THEN (Age + PeriodInc) ELSE NULL END AS Period,
Age + YrInc AS Yr
FROM #yourtable
CROSS JOIN PeriodInc
CROSS JOIN YrInc
Here is a DB_Fiddle with the code
Upvotes: 0
Reputation: 133370
Seem you have always the same increment for age and year so
select age, isnull(period,age +25) Period, isnull(year,age+44) year
from yourtable
or the standard function coalesce (as suggested by Gordon Linoff)
select age, coalesce(period,age +25) Period, coalesce(year,age+44) year
from yourtable
Upvotes: 2