drop section of panel data in Stata

Question

I think that my question is similar to this one: Drop observations in panel data using Stata but I am still doing something wrong and it isn't quite working for me.

I have panel data with the following variables: Year - Month - Subject - Trial - Attempt - Reward

Each subject has 4 trials (or rounds), with 5 attempts per round. The reward changes by attempt and round, the 5th (last) attempt is ALWAYS = 2. For each subject, one of the 4 trials was randomly chosen to have all 5 attempts have reward = 2 (normally attempts 1-4 have reward = 1). I need to delete those "bonus trials".

I know that I need to use by (http://www.stata.com/manuals13/dby.pdf), but I seem to be doing it incorrectly. If I do this:

by trial: drop if attempt == 2 & reward == 2

then I get

not sorted.

If I do this:

by trial, sort: drop if attempt == 2 & reward == 2

it drops 1 observation, when I need it to drop all 5 observations in that trial.

data example:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int Year str3 Month byte(Subject Trial Attempt Reward) str1 Todrop
2016 "Feb" 1 1 1 1 "" 
2016 "Feb" 1 1 2 1 "" 
2016 "Feb" 1 1 3 1 "" 
2016 "Feb" 1 1 4 1 "" 
2016 "Feb" 1 1 5 2 "" 
2016 "Feb" 1 2 1 1 "" 
2016 "Feb" 1 2 2 1 "" 
2016 "Feb" 1 2 3 1 "" 
2016 "Feb" 1 2 4 1 "" 
2016 "Feb" 1 2 5 2 "" 
2016 "Feb" 1 3 1 2 "*"
2016 "Feb" 1 3 2 2 "*"
2016 "Feb" 1 3 3 2 "*"
2016 "Feb" 1 3 4 2 "*"
2016 "Feb" 1 3 5 2 "*"
2016 "Feb" 2 1 1 1 "" 
2016 "Feb" 2 1 2 1 "" 
2016 "Feb" 2 1 3 1 "" 
2016 "Feb" 2 1 4 1 "" 
2016 "Feb" 2 1 5 2 "" 
2016 "Feb" 2 2 1 2 "*"
2016 "Feb" 2 2 2 2 "*"
2016 "Feb" 2 2 3 2 "*"
2016 "Feb" 2 2 4 2 "*"
2016 "Feb" 2 2 5 2 "*"
2016 "Feb" 2 3 1 1 "" 
2016 "Feb" 2 3 2 1 "" 
2016 "Feb" 2 3 3 1 "" 
2016 "Feb" 2 3 4 1 "" 
2016 "Feb" 2 3 5 2 "" 
end

Above is an example for two subjects. What I would like to be able to do is to drop all of trial 3 for subject 1, and all of trial 2 for subject 2 (the starred trials) but not to drop the others (non-starred trials). That is, the variable Todrop is * for observations to be dropped and empty otherwise.

Nick Cox · Accepted Answer

Assuming data are read in as in your example, you can identify which observations to drop by

 bysort Year Month Subject Trial (Reward) : gen todrop = Reward[1] == 2 & Reward[5] == 2

The principles are:

Define groups by cross-combinations of variables. As you say, by: gives a framework here.
Every value of Reward must be 2 in groups to be dropped. If so, then it's necessary and sufficient that after sorting on Reward within groups, the first and the last value be both 2.

Verify that todrop as defined above is 1 if and only if Todrop is *. (e.g. look at a tabulate *drop, missing).

Once satisfied,

drop if todrop

Much more at this Stata FAQ,

drop section of panel data in Stata

Answers (2)

Related Questions