Stata: How do I check whether the value of one variable exists somewhere in the column of another variable?

Question

I have two variables in Stata which are string (MAIDCW1 and strboth).

I would like to check for each observation whether the value of MAIDCW1 exists somewhere in the whole of strboth. If this value exists in the variable, I would like to set the variable TreatmentFinal to 1, otherwise to 0.

I tried the following code variants but I always receive the error code "invalid syntax":

**Variant 1:**

gen TreatmentFinal = 0
forvalues i = 1 / _N {

    quietly count if strboth == MAIDCW1
    
    if r(N) > 0 {
        replace TreatmentFinal = 1 in `i'
    } else {
        replace TreatmentFinal = 0 in `i'
    }
}


**Variant 2:**

gen TreatmentFinal = 0
forvalues i = 1 / _N {

    qui gen match = cond(index(strboth, MAIDCW1[`i']) > 0, 1, 0)
    
    replace TreatmentFinal = match in `i'
   
    drop match
}

I also tried to first format strboth and MAIDCW1 as strings, but this does not change the result and I still receive the error code "invalid syntax".

Addendum:

1.) I use an Excel dataset and randomize a market ID to a treatment (1) & control group (0). So, the variable Treatment is either 1 or 0.

2.) I merge the market ID (e.g., 046914YBB) and Treatment which becomes strboth (e.g., 046914YBB1).

3.) Now I look at a different column which has several entries of the same market ID (for different calender weeks) and add "1" to all of them in order to compare it to strboth, which would indicate whether it belongs to the treatment group (as the market ID entry in strboth also has a "1" at the end) or to the control group (if the entry has a "0" at the end). This variable then is MAIDCW1. Entries with the same marked IDs should always be in the same group (either treatment or control).

4.) Then, for all entries of MAIDCW1 I check whether the value also exists in strboth in order to define for all market IDs of the different calendar weeks whether they belong to the treatment or control group (TreamtentFinal). -> And here the problem of Nick's code appears: The same entries of market IDs for different calendar weeks (entries in MAIDCW1) receive different values in TreatmentFinal (0 or 1) even though they all should have the same value.

Here is the complete code including Nick’s suggestion:

clear
  import excel "Example.xlsx", sheet("Example") firstrow

set seed 123
bysort MarktName: gen double rand1=rnormal() if _n==1
egen max1=max(rand1), by(MarktName)
replace rand1=max1

bysort AbteilungName: gen double rand2=rnormal() if _n==1
egen max2=max(rand2), by(AbteilungName)
replace rand2=max2

sort rand1 rand2

// Randomization pattern
gen Randomisierung =

"111000011100001110000111100011110001111000011100001110000111100011"

// Assignment to treatment (1) or control group (0) 
gen Treatment = substr(Randomisierung, _n, 1)

drop Randomisierung

//Here the relevant part begins:

gen TreatmentFinal = 0 
gen strboth = MAID + Treatment
gen long obsno = _n 
egen long group = group(strboth) 
su group, meanonly  
local G = r(max) 

quietly forval g = 1/`G' { 
    su obsno if group == `g', meanonly 
    count if strboth[r(min)] == MAIDCW1 
    if r(N) > 0 replace TreatmentFinal = 1 if group == `g' 
}

Stata: How do I check whether the value of one variable exists somewhere in the column of another variable?

Answers (1)

Related Questions