Reputation: 103
I think that my question is similar to this one: Drop observations in panel data using Stata but I am still doing something wrong and it isn't quite working for me.
I have panel data with the following variables:
Year
- Month
- Subject
- Trial
- Attempt
- Reward
Each subject has 4 trials (or rounds), with 5 attempts per round. The reward changes by attempt and round, the 5th (last) attempt is ALWAYS = 2. For each subject, one of the 4 trials was randomly chosen to have all 5 attempts have reward = 2
(normally attempts 1-4 have reward = 1
). I need to delete those "bonus trials".
I know that I need to use by
(http://www.stata.com/manuals13/dby.pdf), but I seem to be doing it incorrectly. If I do this:
by trial: drop if attempt == 2 & reward == 2
then I get
not sorted.
If I do this:
by trial, sort: drop if attempt == 2 & reward == 2
it drops 1 observation, when I need it to drop all 5 observations in that trial.
data example:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int Year str3 Month byte(Subject Trial Attempt Reward) str1 Todrop
2016 "Feb" 1 1 1 1 ""
2016 "Feb" 1 1 2 1 ""
2016 "Feb" 1 1 3 1 ""
2016 "Feb" 1 1 4 1 ""
2016 "Feb" 1 1 5 2 ""
2016 "Feb" 1 2 1 1 ""
2016 "Feb" 1 2 2 1 ""
2016 "Feb" 1 2 3 1 ""
2016 "Feb" 1 2 4 1 ""
2016 "Feb" 1 2 5 2 ""
2016 "Feb" 1 3 1 2 "*"
2016 "Feb" 1 3 2 2 "*"
2016 "Feb" 1 3 3 2 "*"
2016 "Feb" 1 3 4 2 "*"
2016 "Feb" 1 3 5 2 "*"
2016 "Feb" 2 1 1 1 ""
2016 "Feb" 2 1 2 1 ""
2016 "Feb" 2 1 3 1 ""
2016 "Feb" 2 1 4 1 ""
2016 "Feb" 2 1 5 2 ""
2016 "Feb" 2 2 1 2 "*"
2016 "Feb" 2 2 2 2 "*"
2016 "Feb" 2 2 3 2 "*"
2016 "Feb" 2 2 4 2 "*"
2016 "Feb" 2 2 5 2 "*"
2016 "Feb" 2 3 1 1 ""
2016 "Feb" 2 3 2 1 ""
2016 "Feb" 2 3 3 1 ""
2016 "Feb" 2 3 4 1 ""
2016 "Feb" 2 3 5 2 ""
end
Above is an example for two subjects. What I would like to be able to do is to drop all of trial 3 for subject 1, and all of trial 2 for subject 2 (the starred trials) but not to drop the others (non-starred trials). That is, the variable Todrop
is *
for observations to be drop
ped and empty otherwise.
Upvotes: 0
Views: 296
Reputation: 37208
Assuming data are read in as in your example, you can identify which observations to drop
by
bysort Year Month Subject Trial (Reward) : gen todrop = Reward[1] == 2 & Reward[5] == 2
The principles are:
Define groups by cross-combinations of variables. As you say, by:
gives a framework here.
Every value of Reward
must be 2 in groups to be drop
ped. If so, then it's necessary and sufficient that after sort
ing on Reward
within groups, the first and the last value be both 2.
Verify that todrop
as defined above is 1 if and only if Todrop
is *
. (e.g. look at a tabulate *drop, missing
).
Once satisfied,
drop if todrop
Much more at this Stata FAQ,
Upvotes: 1
Reputation: 19375
gen flag_temp=1 if attempt == 2 & reward == 2
bysort trial: egen flag=min(flag_temp)
drop if flag==1
boom
Upvotes: 0