Star
Star

Reputation: 2299

Transform string monthly dates in Stata

I have a problem in Stata with the format of the dates. I believe it is a very simple question but I can't see how to fix it.

I have a csv file (file.csv) that looks like

v1            v2
01/01/2000    1.1
01/02/2000    1.2
01/03/2000    1.3
...    
01/12/2000    1.12
01/02/2001    1.1
...
01/12/2001    1.12

The form of v1 is dd/mm/yyyy.

I import the file in Stata using import delimited ...file.csv

v1 is a string variable, v2 is a float.

I want to transform v1 in a monthly date that Stata can read.

My attempts:

1)

gen Time = date(v1, "DMY")
format Time %tm

which gives me

Time
3177m7
3180m2
3182m7
...

that looks wrong.

2) In alternative

gen v1_1=v1
replace v1_1 = substr(v1_1,4,length(v1_1))
gen Time_1 = date(v1_1, "MY")
format Time_1 %tm

which gives exactly the same result.

And if I type

tsset Time, format(%tm)

it tells me that there are gaps but there are no gaps in the data.

Could you help me to understand what I'm doing wrong?

Upvotes: 1

Views: 2386

Answers (1)

ChrisP
ChrisP

Reputation: 5952

Stata has wonderful documentation on dates and times, which you should read from beginning to end if you plan on using time-related variables. Reading this documentation will not only solve your current problem, but will potentially prevent costly errors in the future. The section related to your question is titled "SIF-to-SIF conversion." SIF means "Stata internal form."

To explain your current issue:

Stata stores dates as numbers; you interpret them as "dates" when you assign a format. Consider the following:

set obs 1
gen dt = date("01/01/2003", "DMY")
list dt
// 15706

So that date is assigned the value 15706. Let's format it to look like a day:

format dt %td
list
// 01jan2003

Now let's format it to be a month:

format dt %tm
list
// 3268m11

Notice that dt is just a number that you can format and use like a day or month. To get a "month number" from a "day number", do the following:

gen mt = mofd(dt)  // mofd = month of day
format mt %tm
list
//      dt       mt
// 3268m11   2003m1

The variable mt now equals 516. January 2003 is 516 months from January 1960. Stata's "epoch time" is January 1, 1960 00:00:00.000. Date variables are stored as days since the epoch time, and datetime variables are stored as miliseconds since the epoch time. A month variable can be stored as months since the epoch time (that's how the %tm formatting determines which month to show).

Upvotes: 5

Related Questions