Reputation: 43
I am working with hospital admission data, where information on admission date and discharge date is stored in clock format %tcCCYY-NN-DD_hh:MM_AM, i.e. for example
discharge date
2009-04-21 9:00 AM
So the data information is stored as milliseconds since January 1, 1960, and transforming this into a numeric double variable gives me
discharge date
1556269200000
Now, I would like to shift some of my date variables by 1 minute (just an example), and generate a new variable
gen new_discharge_date = discharge_date + 60*1000
This will only incidentally shift the discharge date by exactly one minute
In the example above this will instead give me
new_discharge_date
2009-04-25 9:00 AM
or as double
new_discharge_date
1556269236224
The difference between new_discharge_date and discharge_date is only 36224 milliseconds instead of 60000.
The problem occurs systematically, sometimes the number of milliseconds since January 1, 1960, will even be lower than before.
Any idea what I am doing wrong?
Upvotes: 1
Views: 1336
Reputation: 37208
Executive summary: Adding a constant to a date-time variable with units milliseconds creates another date-time variable. Both variables should be type double
.
First note that clock is not a storage format in Stata. Clock date-time variables are stored as integers; clock format is a numeric display format, which is quite different. In fact the description in the original question is backwards: the date-time data arrive as strings, which are then converted to milliseconds with the clock()
function.
You are correct that clock date-times should be stored as double
s, as they are often very large integers, but for precisely that reason your shifted date-time (1 minute more than the original values) should not be stored in a float
, which is what your generate
does by default. You need to specify double
in the generate
statement. Using float
instead just gives a crude approximation, which is why you observe errors. This is easy to check using your example as sandbox.
. clear
. set obs 1
number of observations (_N) was 0, now 1
. gen s_discharge_date = "2009-04-21 9:00 AM"
. gen double discharge_date = clock(s_discharge_date, "YMD hm")
. format discharge_date %tc
. gen double new_discharge_date = discharge_date + 60*1000
. format new %tc
. gen long new_discharge_date2 = discharge_date + 60*1000
. format new_discharge_date2 %tc
. list
+--------------------------------------------------------------+
1. | s_discharge_date | discharge_date | new_discharge_date |
| 2009-04-21 9:00 AM | 21apr2009 09:00:00 | 21apr2009 09:01:00 |
|--------------------------------------------------------------|
| new_di~2 |
| . |
+--------------------------------------------------------------+
The advice given in a comment to use long
is wrong, as the last experiment shows immediately. Fairly recent date-times have values in trillions, some orders of magnitude larger than be could held in a long
. help data types
shows the limits on values in various types.
Upvotes: 2