Jeroen
Jeroen

Reputation: 63760

How to design Date-of-Birth in DB and ORM for mix of known and unknown date parts

Note up front, my question turns out to be similar to SO question 1668172.


This is a design question that surely must have popped up for others before, yet I couldn't find an answer that fits my situation. I want to record date-of-birth in my application, with several 'levels' of information:

The technologies I'm using for my app are as follows:

Possibilities for the SQL bit that crossed my mind so far:

For the C# end of this problem I merely got to these two options:

The solutions seem to form matched pairs at 1A, 2B or 3C. Of course 1A isn't a nice solution, but it does set a baseline.

Any tips and links are highly appreciated. Well, if they're related, anyhow :)


Edit, about the answers: I marked one answer as accepted, because I think it will work for me. It's worth looking at the other answers too though, if you've stumbled here with the same question.

Upvotes: 11

Views: 968

Answers (6)

ErikE
ErikE

Reputation: 50251

The SQL Side

My latest idea on this subject is to use a range for dates that are uncertain or can have different specificity. Given two columns:

DobFromDate (inclusive)
DobToDate (exclusive)

Here's how it would work with your scenarios:

Specificity   DobFromDate   DobToDate
-----------   -----------   ----------
YMD            2006-05-05   2006-05-06
YM             2006-05-01   2006-06-01
Y              2006-01-01   2007-01-01
Unknown        0000-01-01   9999-12-31
-> MD, M, D not supported with this scheme

Note that there's no reason this couldn't be carried all the way to hour, minute, second, millisecond, and so on.

Then when querying for people born on a specific day:

DECLARE @BornOnDay date = '2006-05-16'

-- Include lower specificity:
SELECT *
FROM TheTable
WHERE
   DobFromDate <= @BornOnDay
   AND @BornOnDay < DobToDate;

-- Exclude lower specificity:
SELECT *
FROM TheTable
WHERE
   DobFromDate = @BornOnDay
   AND DobToDate = DateAdd(Day, 1, @BornOnDay);

This to me has the best mix of maintainability, ease of use, and expressive power. It won't handle loss of precision in the more significant values (e.g., you know the month and day but not the year) but if that can be worked around then I think it is a winner.

If you will ever be querying by date, then in general the better solutions (in my mind) are going to be those that preserve the items as dates on the server in some fashion.

Also, note that if you're looking for a date range rather than a single day, with my solution you still only need two conditions, not four:

DECLARE
   @FromBornOnDay date = '2006-05-16',
   @ToBornOnDay date = '2006-05-23';

-- Include lower specificity:
SELECT *
FROM TheTable
WHERE
   DobFromDate < @ToBornOnDay
   AND @FromBornOnDay < DobToDate;

The C# Side

I would use a custom class with all the methods needed to do appropriate date math and date comparisons on it. You know the business requirements for how you will use dates that are unknown, and can encode the logic within the class. If you need something before a certain date, will you use only known or unknown items? What will ToString() return? These are things, in my mind, best solved with a class.

Upvotes: 3

bonifaz
bonifaz

Reputation: 598

Obviously, all of the solutions mentioned above do represent some kind of compromise.

Therefore, I would recommend to think carefully which of the 'levels' is the most likely one and optimize for that. Afterwards go for proper exception handling for the other rare cases.

I don't know whether reporting is an issue for you right now or may be later, but you might consider that as third dimension apart from the DB / C# issues.

Upvotes: 1

RichardOD
RichardOD

Reputation: 29157

Whatever you do is going to be messy DB wise. For consumers of these kind of dates, I would write a special class/struct which encapsulates what sort of date it is (I'd probably call it something like PartialDate), to make it easier to deal with for consumers- much like Martin Fowler advocates a Money Class.

If you expose a DateTime directly in C#, this could lead to confusion if you had a "date" of ????-11-23 and you wanted to determine if the customer was over 18 for example- how would you default the date, how would the consumer know that part of the date was invalid etc...

The added benefit of having a PartialDate is it will allow other people reading your code to quickly realise that they are not normal, complete dates and should not be treated as such!

Edit

Thinking about the Partial data concept some more, I decided to Google. I found that There is the concept of Partial on Joda time and an interesting PDF on the topic, which may or may not be useful to you.

Upvotes: 2

Nathan Tregillus
Nathan Tregillus

Reputation: 6344

I would not worry to much about how to store the date, I would still store the date within a datetime field, BUT, if knowing if some part of the date was not populated, I would have flags for each section of the date that is not valid, so your schema would be:

DBODate as Date DayIsSet as Bit MonthIsSet as Bit YearIsSet as Bit.

That way you can still implement all the valid date comparisons, and still know the precision of the date you are working on. (as for the date, I would always default to the missing portion as the min of that value: IE Month default is January, day is the first, year is 1900 or something).

Upvotes: 1

Michael Fredrickson
Michael Fredrickson

Reputation: 37398

Interesting problem...

I like solution 2B over solution 3C because with 3C, it wouldn't be normalized... when you update one of the ints, you'd have to update the DateTime as well or you would be out of sync.

However, when you read the data into your C# end, I'd have a property that would roll up all the ints into a string formatted like you have in solution 1 so that it could easily be displayed.

I'm curious what type of reporting you'll need to do on this data... or if you'll simply be storing and retrieving it from the database.

Upvotes: 1

Davide Piras
Davide Piras

Reputation: 44605

I like the idea of 3 int nullable columns and a struct of 3 nullable int in C#.

it does take some effort in db handling but you can avoid parsing around strings and you can also query with SQL directly by year or year and month and so on...

Upvotes: 2

Related Questions