Msh
Msh

Reputation: 7

Stata Duplicates within a 5 minute trange

For the following dataset example:

 11-12-2014 21:59
 11-12-2014 21:59
 11-12-2014 22:00
 11-12-2014 22:06

I need to regard observations that are less than five minutes apart as duplicates and use them in a "bysort" command afterwards. Does anyone know how I can define duplicates to be observations that are <5 minutes apart?

Upvotes: 0

Views: 48

Answers (1)

user4690969
user4690969

Reputation:

This is an incomplete answer, since for clarity I used simple numbers rather than Stata time values. But it shows the fundamental idea.

clear
input float x
 1
 3
 9
13
17
end
generate run = 0
replace run = x in 1
replace run = cond(x<=run[_n-1]+5,run[_n-1],x) if _n>1

which gives the following result, showing that the variable run identifies sets of "duplicate" observations by your criterion.

. list

     +----------+
     |  x   run |
     |----------|
  1. |  1     1 |
  2. |  3     1 |
  3. |  9     9 |
  4. | 13     9 |
  5. | 17    17 |
     +----------+

Upvotes: 2

Related Questions