Reputation: 21

Finding a match among values occurring previously

I have the following data set

time  person1_person_2   person2_person1   occurrence   cell_count
  1        A_B                B_A               0           1
  2        A_C                C_A               0           2
  3        B_A                A_B               1           3
  4        E_A                A_E               0           4
  5        C_A                A_C               1           5
  6        E_A                A_E               0           6
  7        A_B                B_A               1           7

In Stata, I am trying to create the occurrence variable. It takes the value of 1 if person1_person2 occurs in person2_person1 at an earlier time. For example, if at time = 4 and time = 6 occurrence takes the value 0 because E_A has not occurred in the field person2_person1.

I have tried, with no luck:

gen occurrence = 0 
local i = cell_count-1
foreach j in `i' {
replace occurrence = 1 if person1_person2 == person2_person1[_n-`j']
}

Upvotes: 1

Answers (2)

Roberto Ferrer

Reputation: 11112

Admittedly, not as straightforward as @Nick Cox's solution, but the general idea is fairly simple: record the time of first occurrences of every value for p2_p1 and then compare with time of current values of p1_p2.

Note there is no explicit loop here which is something I was exploring. It does not necessarily mean it is more efficient.

clear all
set more off

clear 
input time  str3 p1_p2   str3 p2_p1   
1        A_B                B_A               
2        A_C                C_A               
3        B_A                A_B               
4        E_A                A_E               
5        C_A                A_C               
6        E_A                A_E               
7        A_B                B_A     
8        P_M                M_P             
9        A_B                B_A
end 

list

tempfile main
save "`main'"

* Create auxiliary data of first occurrences
bysort p2_p1 (time): gen firstflag = (_n == 1) // flag if first occurrence
drop if firstflag == 0 // drop if not first occurrence
drop firstflag p1_p2 // drop unnecessary variables
rename time firsttime // rename accordingly
rename p2_p1 p1_p2 // needed for -merge-

tempfile aux
save "`aux'"

* Merge main data with auxiliary
use "`main'", clear
merge m:1 p1_p2 using "`aux'", keep(master match)

* Compute variable of interest
gen ocurr = (firsttime < time)

* List
drop firsttime _merge
sort time
list

Upvotes: 0

Nick Cox

Reputation: 37318

As you guessed, one way to do this is with a loop.

clear 
input time  str3 person1_person2   str3 person2_person1   
1        A_B                B_A               
2        A_C                C_A               
3        B_A                A_B               
4        E_A                A_E               
5        C_A                A_C               
6        E_A                A_E               
7        A_B                B_A               
end 

gen occurrence = 0 

qui forval i = 2/`=_N' { 
    local I = `i' - 1 
    count if person2_person1 == person1_person2[`i'] in 1/`I' 
    if r(N) replace occurrence = 1 in `i' 
}

if r(N) is equivalent to if r(N) > 0 as r(N) being true (non-zero) and being positive are one and the same, as a count can never be negative. r(N) is the result left in memory by count. See e.g. http://www.stata-journal.com/sjpdf.html?articlenum=pr0029 and http://www.stata-journal.com/sjpdf.html?articlenum=pr0033 for tutorials on count.

Your code includes the lines

local i = cell_count-1
foreach j in `i' {

The first will be evaluated as

local i = cell_count[1] - 1

which comes out as 0, so your loop is just

foreach j in 0 {

and so is the single line

replace occurrence = 1 if person1_person2 == person2_person1[_n]

replace occurrence = 1 if person1_person2 == person2_person1

which tests for simultaneous equality. It's not luck you need, but logic!

Upvotes: 3

Finding a match among values occurring previously

Answers (2)

Related Questions