Reputation: 7
For the following dataset example:
11-12-2014 21:59
11-12-2014 21:59
11-12-2014 22:00
11-12-2014 22:06
I need to regard observations that are less than five minutes apart as duplicates and use them in a "bysort" command afterwards. Does anyone know how I can define duplicates to be observations that are <5 minutes apart?
Upvotes: 0
Views: 48
Reputation:
This is an incomplete answer, since for clarity I used simple numbers rather than Stata time values. But it shows the fundamental idea.
clear
input float x
1
3
9
13
17
end
generate run = 0
replace run = x in 1
replace run = cond(x<=run[_n-1]+5,run[_n-1],x) if _n>1
which gives the following result, showing that the variable run
identifies sets of "duplicate" observations by your criterion.
. list
+----------+
| x run |
|----------|
1. | 1 1 |
2. | 3 1 |
3. | 9 9 |
4. | 13 9 |
5. | 17 17 |
+----------+
Upvotes: 2