scoopfaze
scoopfaze

Reputation: 221

Range of times strings in Stata

I'm working with a Stata dataset that has periods of time saved in a rather odd way, in a string with the word "to" as a an indication of the range of time, with markers for the twelve hour clock, for example, "20march2020 1 p.m. to 3 p.m." I was wondering what the best way of parsing/using this information is, particularly with respect to datetime. I have read through the datetime documentation, and while it's useful for specific times of day, it is not particularly helpful when it comes to ranges of times.

I was considering separating the string into two strings, with the start and end of the range of times, e.g. "20march2020 1 p.m." and "20march2020 3 p.m.", but I was curious if there was a more direct solution to make this data workable. The main concern I have about my approach is automating a change of date if the time interval crosses midnight, e.g. "20march2020 11 p.m. to 1 a.m.". Any suggestions would be sincerely appreciated.

Here is some sample data:

input str28 times

"17may2020 1 p.m. to 10 p.m."
"17may2020 10 p.m. to 5 a.m." 
"18may2020 5 a.m. to noon"
"18may2020 noon to 7 p.m."
"18may2020 7 p.m. to 1 a.m."

Upvotes: 1

Views: 230

Answers (1)

Wouter
Wouter

Reputation: 3261

clear
input str28 times
"17may2020 1 p.m. to 10 p.m."
"17may2020 10 p.m. to 5 a.m." 
"18may2020 5 a.m. to noon"
"18may2020 noon to 7 p.m."
"18may2020 7 p.m. to 1 a.m."
end

// Noon won't be recognized by clock(), so replace with 12 p.m.
replace times = subinstr(times, "noon", "12 p.m.", .)

// Split times in two variables
gen times_only = substr(times, 11, .)
split times_only , parse("to")

// Generate datetime variables
gen double datetime1 = clock(substr(times,1,10) + times_only1, "DMYh")
gen double datetime2 = clock(substr(times,1,10) + times_only2, "DMYh")
format datetime1 datetime2 %tc

// If datetime2 is before datetime1, add one day (86400000 milliseconds)
replace datetime2 = datetime2 + 86400000 if datetime2 < datetime1

// Drop auxiliary variables
drop times_only*

// Admire the results
list

     +-----------------------------------------------------------------------+
     |                       times            datetime1            datetime2 |
     |-----------------------------------------------------------------------|
  1. | 17may2020 1 p.m. to 10 p.m.   17may2020 13:00:00   17may2020 22:00:00 |
  2. | 17may2020 10 p.m. to 5 a.m.   17may2020 22:00:00   18may2020 05:00:00 |
  3. | 18may2020 5 a.m. to 12 p.m.   18may2020 05:00:00   18may2020 12:00:00 |
  4. | 18may2020 12 p.m. to 7 p.m.   18may2020 12:00:00   18may2020 19:00:00 |
  5. | 18may2020 7 p.m. to 12 a.m.   18may2020 19:00:00   19may2020 00:00:00 |
     +-----------------------------------------------------------------------+

Upvotes: 2

Related Questions