Reputation: 21
I have the following data set
time person1_person_2 person2_person1 occurrence cell_count
1 A_B B_A 0 1
2 A_C C_A 0 2
3 B_A A_B 1 3
4 E_A A_E 0 4
5 C_A A_C 1 5
6 E_A A_E 0 6
7 A_B B_A 1 7
In Stata, I am trying to create the occurrence
variable. It takes the value of 1
if person1_person2
occurs in person2_person1
at an earlier time. For example, if at time = 4
and time = 6
occurrence
takes the value 0
because E_A
has not occurred in the field person2_person1
.
I have tried, with no luck:
gen occurrence = 0
local i = cell_count-1
foreach j in `i' {
replace occurrence = 1 if person1_person2 == person2_person1[_n-`j']
}
Upvotes: 1
Views: 72
Reputation: 11112
Admittedly, not as straightforward as @Nick Cox's solution, but the general idea is fairly simple: record the time of first occurrences of every value for p2_p1
and then compare with time of current values of p1_p2
.
Note there is no explicit loop here which is something I was exploring. It does not necessarily mean it is more efficient.
clear all
set more off
clear
input time str3 p1_p2 str3 p2_p1
1 A_B B_A
2 A_C C_A
3 B_A A_B
4 E_A A_E
5 C_A A_C
6 E_A A_E
7 A_B B_A
8 P_M M_P
9 A_B B_A
end
list
tempfile main
save "`main'"
* Create auxiliary data of first occurrences
bysort p2_p1 (time): gen firstflag = (_n == 1) // flag if first occurrence
drop if firstflag == 0 // drop if not first occurrence
drop firstflag p1_p2 // drop unnecessary variables
rename time firsttime // rename accordingly
rename p2_p1 p1_p2 // needed for -merge-
tempfile aux
save "`aux'"
* Merge main data with auxiliary
use "`main'", clear
merge m:1 p1_p2 using "`aux'", keep(master match)
* Compute variable of interest
gen ocurr = (firsttime < time)
* List
drop firsttime _merge
sort time
list
Upvotes: 0
Reputation: 37318
As you guessed, one way to do this is with a loop.
clear
input time str3 person1_person2 str3 person2_person1
1 A_B B_A
2 A_C C_A
3 B_A A_B
4 E_A A_E
5 C_A A_C
6 E_A A_E
7 A_B B_A
end
gen occurrence = 0
qui forval i = 2/`=_N' {
local I = `i' - 1
count if person2_person1 == person1_person2[`i'] in 1/`I'
if r(N) replace occurrence = 1 in `i'
}
if r(N)
is equivalent to if r(N) > 0
as r(N)
being true (non-zero) and being positive are one and the same, as a count can never be negative. r(N)
is the result left in memory by count
. See e.g. http://www.stata-journal.com/sjpdf.html?articlenum=pr0029 and http://www.stata-journal.com/sjpdf.html?articlenum=pr0033 for tutorials on count
.
Your code includes the lines
local i = cell_count-1
foreach j in `i' {
The first will be evaluated as
local i = cell_count[1] - 1
which comes out as 0, so your loop is just
foreach j in 0 {
and so is the single line
replace occurrence = 1 if person1_person2 == person2_person1[_n]
or
replace occurrence = 1 if person1_person2 == person2_person1
which tests for simultaneous equality. It's not luck you need, but logic!
Upvotes: 3