Joe
Joe

Reputation: 82

Calculating age from a varchar column with discrepancies

Hoping you had a restful Easter. Appreciate if you could advice/help me in the following. (Using Function/without function)

Below are my DataSet, Desired Output (With derived Age using the DOB specifications from the rules)

Your help is needed in (Please note that i looking for solution in MSSQL environment):-
1. Coming up with Age field. (I tried the following script but it didn't work as it is not dynamic enough to include all the DOB rules, i also attached an oracle script which worked as a reference for you guys)

SELECT 
[ID],
[DOB],
 'age' = DATEDIFF(HOUR,(CONVERT(date,(CASE WHEN ([DOB] like '99/%/%') THEN (REPLACE([DOB],'99','01'))
               ELSE [DOB] END),103)),GETDATE())/8766 
from [Sample]

Sample_Dataset

create table Sample (
  Id  Varchar (50),
  DOB Varchar (50))

  insert into Sample(Id, DOB)
  Values 
  ('38603', '24/02/1969'),
  ('38605', '22/09/1969'),
  ('36356', '17/03/1954'),
  ('36374', '17/05/1975'),
  ('36441', '17/08/1961'),
  ('1a', '10/05/9999'),
  ('1b', '10/99/9999'),
  ('1c', '99/99/9999'),
  ('2a', '--/--/1935'),
  ('2b', '00/00/1935'),
  ('2c', '88/88/1935'),
  ('2d', '99/99/1935'),
  ('3a', '10/--/1935'),
  ('3b', '10/00/1935'),
  ('3c', '10/88/1935'),
  ('3d', '10/99/1935'),
  ('4a', '--/09/1935'),
  ('4b', '00/09/1935'),
  ('4c', '88/09/1935'),
  ('4d', '99/09/1935')

Desired output

ID     | DOB        | Age (As of 05-03-2018; dd-mm-yyyy)      
38603  | 24/02/1969 | 49  --Everything is known      
38605  | 22/09/1969 | 48        
36356  | 17/03/1954 | 63        
36374  | 17/05/1975 | 42        
36441  | 17/08/1961 | 56    
1a     | 10/05/9999 |null --unknown year
1b     | 10/99/9999 |null
1c     | 99/99/9999 |null
2a     | --/--/1935 |82   --unknown day and month 
2b     | 00/00/1935 |82
2c     | 88/88/1935 |82
2d     | 99/99/1935 |82
3a     | 10/--/1935 |82   --unknown month but known year 
3b     | 10/00/1935 |82
3c     | 10/88/1935 |82
3d     | 10/99/1935 |82
4a     | --/09/1935 |82   --unknown day but known month 
4b     | 00/09/1935 |82
4c     | 88/09/1935 |82
4d     | 99/09/1935 |82    

Rules:- As you can see in the above 5 scenarios in the comments

  1. Everything is known (use the stated DOB to calculate the age)
  2. Unknown Year (put age as null as the year is known)
  3. Unknown day and month (Use 01/07 for the unknown dd/mm and the stated yyyy)
  4. Unknown month but known day (Use 07 for the unknown mm and the stated dd/07/yyyy)
  5. Unknown day but known month (Use 15 for the unknown dd and the stated 15/mm/yyyy)

Solution in Oracle

Creating a function first (Tried replicating this logic in T-SQL but unsuccessful, hence i am here)

create or replace function check_dt(in_date in VARCHAR2, in_format in VARCHAR2 default 'DD/MM/YYYY')
RETURN NUMBER
IS
V_DATE DATE;
V_STATUS INTEGER;
BEGIN

 SELECT TO_DATE(in_date,in_format)
 INTO V_DATECASE  
 FROM DUAL;

 V_STATUS := 0;
 RETURN V_STATUS;  
 EXCEPTION WHEN OTHERS THEN
 V_STATUS := SQLCODE; 
         RETURN V_STATUS;
        END;

        select check_dt('11/30/2017') from dual;
        select TO_DATE('15/--/9999','DD/MM/YYYY') from dual;

select id, dob,
       case when check_dt(dob) = -1843 --not valid month, default it to July (07)
               THEN substr(dob,1,2)||'/07'||substr(dob,7,4) 
            when check_dt(dob) = -01847 -- day of month must between 1 and last day of month
               THEN '1/07/'||substr(dob,7,4)
            WHEN check_dt(dob) = 0 and to_date(dob,'dd/mm/yyyy') > sysdate
               THEN NULL
            WHEN check_dt(dob) = -0183 -- date not valid for month
               THEN '15/'||substr(dob,4)
            ELSE
               THEN dob 
            END New_dob
from SAMPLE; 

Any help would be much appreciated. Thank you very Much.

Upvotes: 1

Views: 407

Answers (3)

Jay Shankar Gupta
Jay Shankar Gupta

Reputation: 6088

SQL SERVER

SELECT id,
       CASE WHEN YEAR(GETDATE())-REVERSE(LEFT(REVERSE(DOB), CHARINDEX('/', REVERSE(DOB)) - 1)) > = 0 
            THEN 
              YEAR(GETDATE())-REVERSE(LEFT(REVERSE(DOB), CHARINDEX('/', REVERSE(DOB)) - 1))
          ELSE 
             NULL
       END AS Age
FROM Sample

Solution For Your Question

WITH CTE AS
(
 SELECT id,
       CASE WHEN ISNUMERIC(REVERSE(LEFT(REVERSE(DOB), CHARINDEX('/', REVERSE(DOB)) - 1))) = 1  THEN
                REVERSE(LEFT(REVERSE(DOB), CHARINDEX('/', REVERSE(DOB)) - 1))
            ELSE
                NULL
        END
       AS Year,
       CASE WHEN ISNUMERIC(LEFT(DOB, CHARINDEX('/', DOB) - 1)) = 1 THEN
                 LEFT(DOB, CHARINDEX('/', DOB) - 1)
            ELSE
                NULL
       END AS DAY,
       CASE WHEN ISNUMERIC(SUBSTRING(DOB,CHARINDEX('/',DOB)+1, CHARINDEX('/',DOB,CHARINDEX('/',DOB)+1) -CHARINDEX('/',DOB)-1)) = 1 THEN
            CASE WHEN SUBSTRING(DOB,CHARINDEX('/',DOB)+1, CHARINDEX('/',DOB,CHARINDEX('/',DOB)+1) -CHARINDEX('/',DOB)-1) >= 1 AND SUBSTRING(DOB,CHARINDEX('/',DOB)+1, CHARINDEX('/',DOB,CHARINDEX('/',DOB)+1) -CHARINDEX('/',DOB)-1)  <= 12 THEN
                SUBSTRING(DOB,CHARINDEX('/',DOB)+1, CHARINDEX('/',DOB,CHARINDEX('/',DOB)+1) -CHARINDEX('/',DOB)-1)
                ELSE 
                NULL
            END
            ELSE
                NULL
        END AS MONTH
FROM Sample),CTE1 AS
(
  SELECT id,
         year,
         month,
         CASE WHEN DAY IS NOT NULL THEN
              CASE WHEN DAY >= 1 AND DAY <= DAY(EOMONTH(year+'-'+month+'-01')) THEN
                DAY
              ELSE
                NULL
              END  
         ELSE NULL
         END AS Day
  FROM CTE
)
,CTE2 AS
(
SELECT id,
           CASE WHEN YEAR IS NULL
                       THEN NULL
                     ELSE
                       CASE WHEN DAY IS NULL AND MONTH IS NULL THEN '01/07'
                            WHEN MONTH IS NULL AND DAY IS NOT NULL THEN CAST(day AS VARCHAR)+'/07'
                            WHEN MONTH IS NOT NULL AND DAY IS NULL THEN '15/'+CAST(MONTH AS VARCHAR)
                            ELSE CAST(day AS VARCHAR)+'/'+CAST(MONTH AS VARCHAR)
                       END
                       + '/'+CAST(YEAR AS VARCHAR)
                END
        AS DOB
FROM CTE1
)
SELECT id,DOB,
   CASE WHEN DOB IS NOT NULL
        THEN 
          CASE WHEN DATEDIFF (day,  CONVERT(DATE, DOB, 103),CONVERT(DATE,GETDATE(),103)) >=0
           THEN FLOOR(DATEDIFF (day, CONVERT(DATE, DOB, 103), CONVERT(DATE,GETDATE(),103)) / 365.2425)
           ELSE
              NULL
          END
      ELSE 
         DOB
   END AS Age
FROM CTE2

DEMO LIVE

http://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=3714d33cacb02c3fce4f0868c9d0990b

Upvotes: 2

EzLo
EzLo

Reputation: 14189

You can use the following. I used different CTE's to show you the progression towards obtaining the date of birth from your varchar DOB. I also changed the table to a temporary one.

 IF OBJECT_ID('tempdb..#Sample') IS NOT NULL
    DROP TABLE #Sample

create table #Sample (
  Id Int,
  DOB Varchar (50))

insert into #Sample(Id, DOB)
Values 
(38603, '24/02/1969'),
(38605, '22/09/1969'),
(36356, '17/03/1954'),
(36374, '17/05/1975'),
(36441, '17/08/1961'),
(119, '10/05/9999'),
(114, '10/99/9999'),
(132, '99/99/9999'),
(25125, '--/--/1935'),
(2323, '00/00/1935'),
(2512, '88/88/1935'),
(2156, '99/99/1935'),
(368, '10/--/1935'),
(34135, '10/00/1935'),
(3435, '10/88/1935'),
(3241, '10/99/1935'),
(4512, '--/09/1935'),
(4161, '00/09/1935'),
(4312, '88/09/1935'),
(456, '99/09/1935')

;WITH ParsedBirth AS
(
    SELECT
        S.Id,
        S.DOB,
        Year = SUBSTRING(S.DOB, 7, 4),
        Month = SUBSTRING(S.DOB, 4, 2),
        Day = SUBSTRING(S.DOB, 1, 2)
    FROM
        #Sample AS S
),
ParsedBirthInteger AS
(
    SELECT
        P.Id,
        P.DOB,
        Year = CASE WHEN ISNUMERIC(P.Year) = 1 AND P.Year <> '9999' THEN CONVERT(INT, P.Year) END,
        Month = CASE 
            WHEN ISNUMERIC(P.Month) = 1 AND CONVERT(INT, P.Month) BETWEEN 1 AND 12 THEN CONVERT(INT, P.Month) 
            ELSE 7 END,
        Day = CASE 
            WHEN ISNUMERIC(P.Day) = 1 AND CONVERT(INT, P.Day) BETWEEN 1 AND 31 THEN CONVERT(INT, P.Day) 
            ELSE 15 END
    FROM
        ParsedBirth AS P
),
InferredBirth AS
(
    SELECT
        P.Id,
        P.DOB,
        InferredBirth = CONVERT(DATE, CONVERT(VARCHAR(100), P.Year * 10000 + P.Month * 100 + P.Day))
    FROM
        ParsedBirthInteger AS P
)
SELECT
    T.Id,
    T.DOB,
    T.InferredBirth,
    Age = (CONVERT(INT,CONVERT(char(8), GETDATE(),112))-CONVERT(char(8),T.InferredBirth,112))/10000
FROM
    InferredBirth AS T

Upvotes: 1

Gottfried Lesigang
Gottfried Lesigang

Reputation: 67291

First of all:

  • It is a very bad idea to store a date in a culture dependant string format.
  • It is a very bad idea to use magic values (9999 meaning "no year").
  • Is is a most dangerous, bad idea to mix this!

The following code will transform your values to the format you should use to store this actually. You can build your age-logic from here, but I'd really recommend you, to use this approach to clean up this mess and store your data properly!

DECLARE @sample TABLE(
  Id VARCHAR(10),
  DOB VARCHAR (50))

  INSERT INTO @sample(Id, DOB)
  VALUES 
  ('38603', '24/02/1969'),
  ('38605', '22/09/1969'),
  ('36356', '17/03/1954'),
  ('36374', '17/05/1975'),
  ('36441', '17/08/1961'),
  ('1a', '10/05/9999'),
  ('1b', '10/99/9999'),
  ('1c', '99/99/9999'),
  ('2a', '--/--/1935'),
  ('2b', '00/00/1935'),
  ('2c', '88/88/1935'),
  ('2d', '99/99/1935'),
  ('3a', '10/--/1935'),
  ('3b', '10/00/1935'),
  ('3c', '10/88/1935'),
  ('3d', '10/99/1935'),
  ('4a', '--/09/1935'),
  ('4b', '00/09/1935'),
  ('4c', '88/09/1935'),
  ('4d', '99/09/1935');

--The query will split your string on the / and try to cast the values to int:

WITH Splitted AS
(
    SELECT Id
          ,DOB 
          ,CAST('<x>' + REPLACE(DOB,'/','</x><x>') + '</x>' AS XML).value('/x[1]','varchar(10)') AS DOB_Day
          ,CAST('<x>' + REPLACE(DOB,'/','</x><x>') + '</x>' AS XML).value('/x[2]','varchar(10)') AS DOB_Month
          ,CAST('<x>' + REPLACE(DOB,'/','</x><x>') + '</x>' AS XML).value('/x[3]','varchar(10)') AS DOB_Year
    FROM @sample
)
,Casted AS
(
    SELECT Id
          ,DOB
           --below SQL-Server 2012 you can use `CASE` with `ISNUMERIC` instead of TRY_CAST
          ,TRY_CAST(DOB_Day AS INT)  AS CastedDay 
          ,TRY_CAST(DOB_Month AS INT)  AS CastedMonth
          ,TRY_CAST(DOB_Year AS INT)  AS CastedYear 
    FROM Splitted
)
,Checked AS
(
    SELECT Id
          ,DOB
          --You can use further logic to get the month's days correctly (instead of the plain 31)  
          ,CASE WHEN CastedDay BETWEEN 1 AND 31 THEN CastedDay ELSE NULL END AS TheDay
          ,CASE WHEN CastedMonth BETWEEN 1 AND 12 THEN CastedMonth ELSE NULL END AS TheMonth
          ,CASE WHEN CastedYear BETWEEN 1900 AND 2100 THEN CastedYear ELSE NULL END AS TheYear
    FROM Casted
)
SELECT *
FROM Checked; 

The result

+-------+------------+--------+----------+---------+
| Id    | DOB        | TheDay | TheMonth | TheYear |
+-------+------------+--------+----------+---------+
| 38603 | 24/02/1969 | 24     | 2        | 1969    |
+-------+------------+--------+----------+---------+
| 38605 | 22/09/1969 | 22     | 9        | 1969    |
+-------+------------+--------+----------+---------+
| 36356 | 17/03/1954 | 17     | 3        | 1954    |
+-------+------------+--------+----------+---------+
| 36374 | 17/05/1975 | 17     | 5        | 1975    |
+-------+------------+--------+----------+---------+
| 36441 | 17/08/1961 | 17     | 8        | 1961    |
+-------+------------+--------+----------+---------+
| 1a    | 10/05/9999 | 10     | 5        | NULL    |
+-------+------------+--------+----------+---------+
| 1b    | 10/99/9999 | 10     | NULL     | NULL    |
+-------+------------+--------+----------+---------+
| 1c    | 99/99/9999 | NULL   | NULL     | NULL    |
+-------+------------+--------+----------+---------+
| 2a    | --/--/1935 | NULL   | NULL     | 1935    |
+-------+------------+--------+----------+---------+
| 2b    | 00/00/1935 | NULL   | NULL     | 1935    |
+-------+------------+--------+----------+---------+
| 2c    | 88/88/1935 | NULL   | NULL     | 1935    |
+-------+------------+--------+----------+---------+
| 2d    | 99/99/1935 | NULL   | NULL     | 1935    |
+-------+------------+--------+----------+---------+
| 3a    | 10/--/1935 | 10     | NULL     | 1935    |
+-------+------------+--------+----------+---------+
| 3b    | 10/00/1935 | 10     | NULL     | 1935    |
+-------+------------+--------+----------+---------+
| 3c    | 10/88/1935 | 10     | NULL     | 1935    |
+-------+------------+--------+----------+---------+
| 3d    | 10/99/1935 | 10     | NULL     | 1935    |
+-------+------------+--------+----------+---------+
| 4a    | --/09/1935 | NULL   | 9        | 1935    |
+-------+------------+--------+----------+---------+
| 4b    | 00/09/1935 | NULL   | 9        | 1935    |
+-------+------------+--------+----------+---------+
| 4c    | 88/09/1935 | NULL   | 9        | 1935    |
+-------+------------+--------+----------+---------+
| 4d    | 99/09/1935 | NULL   | 9        | 1935    |
+-------+------------+--------+----------+---------+

Upvotes: 0

Related Questions