Reputation: 1594
AWS Athena (Trino) Convert birthdate string (mm/dd/yy) to date -- need twentieth century
I have found similar questions, but they either aren't specific to Athena, or to this two-digit to date format for a birthdate (e.g. 56 should map to 1956)
An example of similar question (this one is 4-year date is): Amazon Athena Convert String to Date
For example, cast( date_parse(trim("date of birth"),'%m/%d/%Y') as date ) as our_date_of_birth
gives the laughable: 0094-01-04 instead of a date in 1994
However, if I use:
cast( date_parse(trim("date of birth"),'%m/%d/%y') as date ) as our_date_of_birth
,
it sometimes gives me correct date, but sometimes something like: 2062-07-31 instead of 1962
Finally, just plain:
cast( trim("date of birth") as date ) as our_date_of_birth
gives an error: INVALID_CAST_ARGUMENT: Value cannot be cast to date: 10/11/78
Is there a way to get twentieth century birthdates from these casts in Athena Trino? Obviously there would be edge cases such as 01/01/20 which could map to either 1920 or 2020, but dates like 01/01/50 should definitely map to 1950.
Sample data and outputs:
01/01/56 -- output would be 1956-01-01 as date
01/01/08 -- output would be 2008-01-01 as date
01/01/21 -- output would be 2021-01-01 as date (* some would want 1921 here)
07/01/21 -- output would be 1921-07-01 as date (since as of posting 07/01/2021 would be in future)
**The outuput format isn't crucial, it could be 01/01/1956, just so it is a true 'date' in Athena Trino.**
Upvotes: 0
Views: 2625
Reputation: 1305
One way would be to subtract 100 years whenever the parsed date is in the future. For example:
select case when
parse_datetime(birthdate, 'MM/dd/yy') > current_timestamp then
parse_datetime(birthdate, 'MM/dd/yy') - interval '100' year
else parse_datetime(birthdate, 'MM/dd/yy')
end as birthdate
Note that this would work only until the next century.
The parse_datetime
function returns a timestamp object, see the docs: https://prestodb.io/docs/current/functions/datetime.html
parse_datetime
uses java's DateFormat conventions. From the docs:
https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html
For parsing with the abbreviated year pattern ("y" or "yy"), SimpleDateFormat must interpret the abbreviated year relative to some century. It does this by adjusting dates to be within 80 years before and 20 years after the time the SimpleDateFormat instance is created.
Upvotes: 2