Reputation: 63760
Note up front, my question turns out to be similar to SO question 1668172.
This is a design question that surely must have popped up for others before, yet I couldn't find an answer that fits my situation. I want to record date-of-birth in my application, with several 'levels' of information:
NULL
value, i.e. DoB is unkown1950-??-??
Only the DoB year value is known, date/month aren't????-11-23
Just a month, day, or combination of the two, but without a year1950-11-23
Full DoB is knownThe technologies I'm using for my app are as follows:
Possibilities for the SQL bit that crossed my mind so far:
1950-11-23
, and replace unkowns with 'X's, e.g. XXXX-11-23
or 1950-XX-XX
1950
, 11
, and 23
For the C# end of this problem I merely got to these two options:
The solutions seem to form matched pairs at 1A, 2B or 3C. Of course 1A isn't a nice solution, but it does set a baseline.
Any tips and links are highly appreciated. Well, if they're related, anyhow :)
Edit, about the answers: I marked one answer as accepted, because I think it will work for me. It's worth looking at the other answers too though, if you've stumbled here with the same question.
Upvotes: 11
Views: 968
Reputation: 50251
The SQL Side
My latest idea on this subject is to use a range for dates that are uncertain or can have different specificity. Given two columns:
DobFromDate (inclusive)
DobToDate (exclusive)
Here's how it would work with your scenarios:
Specificity DobFromDate DobToDate
----------- ----------- ----------
YMD 2006-05-05 2006-05-06
YM 2006-05-01 2006-06-01
Y 2006-01-01 2007-01-01
Unknown 0000-01-01 9999-12-31
-> MD, M, D not supported with this scheme
Note that there's no reason this couldn't be carried all the way to hour, minute, second, millisecond, and so on.
Then when querying for people born on a specific day:
DECLARE @BornOnDay date = '2006-05-16'
-- Include lower specificity:
SELECT *
FROM TheTable
WHERE
DobFromDate <= @BornOnDay
AND @BornOnDay < DobToDate;
-- Exclude lower specificity:
SELECT *
FROM TheTable
WHERE
DobFromDate = @BornOnDay
AND DobToDate = DateAdd(Day, 1, @BornOnDay);
This to me has the best mix of maintainability, ease of use, and expressive power. It won't handle loss of precision in the more significant values (e.g., you know the month and day but not the year) but if that can be worked around then I think it is a winner.
If you will ever be querying by date, then in general the better solutions (in my mind) are going to be those that preserve the items as dates on the server in some fashion.
Also, note that if you're looking for a date range rather than a single day, with my solution you still only need two conditions, not four:
DECLARE
@FromBornOnDay date = '2006-05-16',
@ToBornOnDay date = '2006-05-23';
-- Include lower specificity:
SELECT *
FROM TheTable
WHERE
DobFromDate < @ToBornOnDay
AND @FromBornOnDay < DobToDate;
The C# Side
I would use a custom class with all the methods needed to do appropriate date math and date comparisons on it. You know the business requirements for how you will use dates that are unknown, and can encode the logic within the class. If you need something before a certain date, will you use only known or unknown items? What will ToString()
return? These are things, in my mind, best solved with a class.
Upvotes: 3
Reputation: 598
Obviously, all of the solutions mentioned above do represent some kind of compromise.
Therefore, I would recommend to think carefully which of the 'levels' is the most likely one and optimize for that. Afterwards go for proper exception handling for the other rare cases.
I don't know whether reporting is an issue for you right now or may be later, but you might consider that as third dimension apart from the DB / C# issues.
Upvotes: 1
Reputation: 29157
Whatever you do is going to be messy DB wise. For consumers of these kind of dates, I would write a special class/struct which encapsulates what sort of date it is (I'd probably call it something like PartialDate), to make it easier to deal with for consumers- much like Martin Fowler advocates a Money Class.
If you expose a DateTime directly in C#, this could lead to confusion if you had a "date" of ????-11-23 and you wanted to determine if the customer was over 18 for example- how would you default the date, how would the consumer know that part of the date was invalid etc...
The added benefit of having a PartialDate is it will allow other people reading your code to quickly realise that they are not normal, complete dates and should not be treated as such!
Edit
Thinking about the Partial data concept some more, I decided to Google. I found that There is the concept of Partial on Joda time and an interesting PDF on the topic, which may or may not be useful to you.
Upvotes: 2
Reputation: 6344
I would not worry to much about how to store the date, I would still store the date within a datetime field, BUT, if knowing if some part of the date was not populated, I would have flags for each section of the date that is not valid, so your schema would be:
DBODate as Date DayIsSet as Bit MonthIsSet as Bit YearIsSet as Bit.
That way you can still implement all the valid date comparisons, and still know the precision of the date you are working on. (as for the date, I would always default to the missing portion as the min of that value: IE Month default is January, day is the first, year is 1900 or something).
Upvotes: 1
Reputation: 37398
Interesting problem...
I like solution 2B over solution 3C because with 3C, it wouldn't be normalized... when you update one of the ints, you'd have to update the DateTime as well or you would be out of sync.
However, when you read the data into your C# end, I'd have a property that would roll up all the ints into a string formatted like you have in solution 1 so that it could easily be displayed.
I'm curious what type of reporting you'll need to do on this data... or if you'll simply be storing and retrieving it from the database.
Upvotes: 1
Reputation: 44605
I like the idea of 3 int nullable columns and a struct of 3 nullable int in C#.
it does take some effort in db handling but you can avoid parsing around strings and you can also query with SQL directly by year or year and month and so on...
Upvotes: 2