Collapse two rows in stata into one that share values in different columns

Question

I have whittled down a Stata dataset into something like this example of two rows:

id   location     start_datetime         end_datetime
1    64^6418    25feb2009 15:45:54    25feb2009 15:58:54
1    64^6418    25feb2009 15:58:54    09mary2009 15:16:06

As you can see the end_datetime of one row for the same id and location is the exact same as the start_datetime of the next row. I'd like to collapse rows that fit this criteria so that they are just one row, with the start_datetime of the first row and the end_datetime of the second row. For example, I would like rows that look like the above to be transformed to this

id   location     start_datetime         end_datetime
1    64^6418    25feb2009 15:45:54    09mary2009 15:16:06

I have accomplished this change with the below code, but it is sort of hack-y and I wonder if there is a better way to do this in Stata:

// Sort by id and start_datetime
sort id location start_datetime

//Check if next start_datetime is the same as current end_datetime
by id (location): gen same_as_next = start_datetime[_n+1] == end_datetime & location[_n+1] == location

//Create flag variable that switches to 1 whenever there is a change in the location or a non-contiguous stays
gen change_flag = same_as_next == 0 if same_as_next != .

//Fill in missing values with 0
replace change_flag = 0 if change_flag == .

* Revisit this step to debug
//Generate groups of contiguous stays
by id (location): gen group = 1 + sum(change_flag)

//Group values are skewed by 1 row so create temp var to store next group value
by id (location): gen temp_group = group[_n-1]

//Force row 1 to be group 1 
replace temp_group = 1 if _n == 1

replace temp_group = 1 if temp_group == .

collapse (min) start_datetime (max) end_datetime, by(id location temp_group)

drop temp_group

Collapse two rows in stata into one that share values in different columns

Answers (1)

Related Questions