Reputation: 221
I'm working with a Stata dataset that has periods of time saved in a rather odd way, in a string with the word "to" as a an indication of the range of time, with markers for the twelve hour clock, for example, "20march2020 1 p.m. to 3 p.m." I was wondering what the best way of parsing/using this information is, particularly with respect to datetime
. I have read through the datetime
documentation, and while it's useful for specific times of day, it is not particularly helpful when it comes to ranges of times.
I was considering separating the string into two strings, with the start and end of the range of times, e.g. "20march2020 1 p.m." and "20march2020 3 p.m.", but I was curious if there was a more direct solution to make this data workable. The main concern I have about my approach is automating a change of date if the time interval crosses midnight, e.g. "20march2020 11 p.m. to 1 a.m.". Any suggestions would be sincerely appreciated.
Here is some sample data:
input str28 times
"17may2020 1 p.m. to 10 p.m."
"17may2020 10 p.m. to 5 a.m."
"18may2020 5 a.m. to noon"
"18may2020 noon to 7 p.m."
"18may2020 7 p.m. to 1 a.m."
Upvotes: 1
Views: 230
Reputation: 3261
clear
input str28 times
"17may2020 1 p.m. to 10 p.m."
"17may2020 10 p.m. to 5 a.m."
"18may2020 5 a.m. to noon"
"18may2020 noon to 7 p.m."
"18may2020 7 p.m. to 1 a.m."
end
// Noon won't be recognized by clock(), so replace with 12 p.m.
replace times = subinstr(times, "noon", "12 p.m.", .)
// Split times in two variables
gen times_only = substr(times, 11, .)
split times_only , parse("to")
// Generate datetime variables
gen double datetime1 = clock(substr(times,1,10) + times_only1, "DMYh")
gen double datetime2 = clock(substr(times,1,10) + times_only2, "DMYh")
format datetime1 datetime2 %tc
// If datetime2 is before datetime1, add one day (86400000 milliseconds)
replace datetime2 = datetime2 + 86400000 if datetime2 < datetime1
// Drop auxiliary variables
drop times_only*
// Admire the results
list
+-----------------------------------------------------------------------+
| times datetime1 datetime2 |
|-----------------------------------------------------------------------|
1. | 17may2020 1 p.m. to 10 p.m. 17may2020 13:00:00 17may2020 22:00:00 |
2. | 17may2020 10 p.m. to 5 a.m. 17may2020 22:00:00 18may2020 05:00:00 |
3. | 18may2020 5 a.m. to 12 p.m. 18may2020 05:00:00 18may2020 12:00:00 |
4. | 18may2020 12 p.m. to 7 p.m. 18may2020 12:00:00 18may2020 19:00:00 |
5. | 18may2020 7 p.m. to 12 a.m. 18may2020 19:00:00 19may2020 00:00:00 |
+-----------------------------------------------------------------------+
Upvotes: 2