elevator_man
elevator_man

Reputation: 43

Stata: adding a number to a date variable

I am working with hospital admission data, where information on admission date and discharge date is stored in clock format %tcCCYY-NN-DD_hh:MM_AM, i.e. for example

  discharge date
2009-04-21 9:00 AM

So the data information is stored as milliseconds since January 1, 1960, and transforming this into a numeric double variable gives me

discharge date
1556269200000

Now, I would like to shift some of my date variables by 1 minute (just an example), and generate a new variable

gen new_discharge_date = discharge_date + 60*1000

This will only incidentally shift the discharge date by exactly one minute

In the example above this will instead give me

new_discharge_date 
2009-04-25 9:00 AM

or as double

new_discharge_date
 1556269236224

The difference between new_discharge_date and discharge_date is only 36224 milliseconds instead of 60000.

The problem occurs systematically, sometimes the number of milliseconds since January 1, 1960, will even be lower than before.

Any idea what I am doing wrong?

Upvotes: 1

Views: 1336

Answers (1)

Nick Cox
Nick Cox

Reputation: 37208

Executive summary: Adding a constant to a date-time variable with units milliseconds creates another date-time variable. Both variables should be type double.

First note that clock is not a storage format in Stata. Clock date-time variables are stored as integers; clock format is a numeric display format, which is quite different. In fact the description in the original question is backwards: the date-time data arrive as strings, which are then converted to milliseconds with the clock() function.

You are correct that clock date-times should be stored as doubles, as they are often very large integers, but for precisely that reason your shifted date-time (1 minute more than the original values) should not be stored in a float, which is what your generate does by default. You need to specify double in the generate statement. Using float instead just gives a crude approximation, which is why you observe errors. This is easy to check using your example as sandbox.

. clear

. set obs 1 
number of observations (_N) was 0, now 1

. gen s_discharge_date = "2009-04-21 9:00 AM"

. gen double discharge_date = clock(s_discharge_date, "YMD hm") 

. format discharge_date %tc 

. gen double new_discharge_date = discharge_date + 60*1000

. format new %tc

. gen long new_discharge_date2 = discharge_date + 60*1000

. format new_discharge_date2 %tc

. list 

     +--------------------------------------------------------------+
  1. |   s_discharge_date |     discharge_date | new_discharge_date |
     | 2009-04-21 9:00 AM | 21apr2009 09:00:00 | 21apr2009 09:01:00 |
     |--------------------------------------------------------------|
     |                           new_di~2                           |
     |                                  .                           |
     +--------------------------------------------------------------+

The advice given in a comment to use long is wrong, as the last experiment shows immediately. Fairly recent date-times have values in trillions, some orders of magnitude larger than be could held in a long. help data types shows the limits on values in various types.

Upvotes: 2

Related Questions