Reputation: 49
I want to create a variable that gives the number of months between the two daily date variables in my data set (end_date and start_date). I want to count the month of the start date if it is the 1st day of the month and similarly, I want to count the month of the end date if it is the last day of the month.
For example, let's say that the start_date is 01feb2010 and end_date is 12jun2010. I want to count February and the months between February and June but not the month June. So, 4 months in total.
Upvotes: 0
Views: 2266
Reputation: 339
Only to complement the undisputed NC:
If you're on Stata 17, there is now a new function datediff_frac()
which allows to calculate the difference between two dates in months:
clear
input float(start end)
18294 18425
18294 18659
18294 18440
18294 18474
18295 18474
end
format %td start* end*
** new datediff_frac() **
gen exact_dif = datediff_frac(start, end, "month")
gen months_past = floor(exact_dif)
list
+-------------------------------------------------+
| start end exact_dif months_past |
|-------------------------------------------------|
1. | 01feb2010 12jun2010 4.366667 4 |
2. | 01feb2010 01feb2011 12 12 |
3. | 01feb2010 27jun2010 4.866667 4 |
4. | 01feb2010 31jul2010 5.967742 5 |
5. | 02feb2010 31jul2010 5.935484 5 |
+-------------------------------------------------+
More info here
Upvotes: 0
Reputation: 37183
Extending the helpful answer of @Cybernike:
You want, it seems, the number of completed months between one start date and another end date. Pushing daily dates through mofd()
and getting the difference is the basis of an answer, except that we need to add 1 in general, but also subtract 1 unless the end date is the last day of its month.
Repeat after me: the last day of the current month is the day before the first day of the next month. The first day of the next month is easy to find, and all you need to do then is subtract 1 to get the daily date of the last day of the current month. There is no need to check up on varying month length, 31, 30, 29 or 28 days. There is a paper riffing on this theme. See here.
clear
input float(start end)
18294 18425
18294 18659
18294 18440
18294 18474
18295 18474
end
gen end2 = dofm(mofd(end) + 1) - 1
gen start2 = dofm(mofd(start))
format %td start* end*
gen wanted1 = mofd(end) - mofd(start) + 1 - (end2 != end)
gen wanted2 = mofd(end) - mofd(start) + 1 - (start2 != start) - (end2 != end)
list
+-------------------------------------------------------------------+
| start end end2 start2 wanted1 wanted2 |
|-------------------------------------------------------------------|
1. | 01feb2010 12jun2010 30jun2010 01feb2010 4 4 |
2. | 01feb2010 01feb2011 28feb2011 01feb2010 12 12 |
3. | 01feb2010 27jun2010 30jun2010 01feb2010 4 4 |
4. | 01feb2010 31jul2010 31jul2010 01feb2010 6 6 |
5. | 02feb2010 31jul2010 31jul2010 01feb2010 6 5 |
+-------------------------------------------------------------------+
However, there is an ambiguity in your question. Could the start date be other than the first day of its month? If so, then do you need to check for that condition too? See wanted2
for a stricter solution.
Upvotes: 3
Reputation: 1103
Here is a very crude solution:
gen diff_days = end_date - start_date
gen diff_months = floor(diff_days/30.437)
list
+---------------------------------------------+
| start_d~e end_date diff_d~s diff_m~s |
|---------------------------------------------|
1. | 01feb2010 12jun2010 131 4 |
2. | 01feb2010 01feb2011 365 12 |
3. | 01feb2010 27jun2010 146 4 |
+---------------------------------------------+
There is probably a better way to do this. Note that the average calendar month length is 30.437 days.
Upvotes: 3