JosephDoggie
JosephDoggie

Reputation: 1594

AWS Athena (Trino SQL) Convert birthdate string (mm/dd/yy) to date -- need twentieth century

AWS Athena (Trino) Convert birthdate string (mm/dd/yy) to date -- need twentieth century

I have found similar questions, but they either aren't specific to Athena, or to this two-digit to date format for a birthdate (e.g. 56 should map to 1956)

An example of similar question (this one is 4-year date is): Amazon Athena Convert String to Date

For example, cast( date_parse(trim("date of birth"),'%m/%d/%Y') as date ) as our_date_of_birth gives the laughable: 0094-01-04 instead of a date in 1994

However, if I use: cast( date_parse(trim("date of birth"),'%m/%d/%y') as date ) as our_date_of_birth, it sometimes gives me correct date, but sometimes something like: 2062-07-31 instead of 1962

Finally, just plain:

cast( trim("date of birth") as date ) as our_date_of_birth

gives an error: INVALID_CAST_ARGUMENT: Value cannot be cast to date: 10/11/78

Is there a way to get twentieth century birthdates from these casts in Athena Trino? Obviously there would be edge cases such as 01/01/20 which could map to either 1920 or 2020, but dates like 01/01/50 should definitely map to 1950.

Sample data and outputs:

01/01/56 -- output would be 1956-01-01 as date
01/01/08 -- output would be 2008-01-01 as date
01/01/21 -- output would be 2021-01-01 as date (* some would want 1921 here)
07/01/21 -- output would be 1921-07-01 as date (since as of posting 07/01/2021 would be in future)

**The outuput format isn't crucial, it could be 01/01/1956, just so it is a true 'date' in Athena Trino.**

Upvotes: 0

Views: 2625

Answers (1)

Nicolas Busca
Nicolas Busca

Reputation: 1305

One way would be to subtract 100 years whenever the parsed date is in the future. For example:

select case when 
         parse_datetime(birthdate, 'MM/dd/yy') > current_timestamp then
                    parse_datetime(birthdate, 'MM/dd/yy') - interval '100' year 
         else parse_datetime(birthdate, 'MM/dd/yy') 
       end as birthdate

Note that this would work only until the next century.

The parse_datetime function returns a timestamp object, see the docs: https://prestodb.io/docs/current/functions/datetime.html

parse_datetime uses java's DateFormat conventions. From the docs: https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html

For parsing with the abbreviated year pattern ("y" or "yy"), SimpleDateFormat must interpret the abbreviated year relative to some century. It does this by adjusting dates to be within 80 years before and 20 years after the time the SimpleDateFormat instance is created.

Upvotes: 2

Related Questions