Delete observations in a panel when identifier is contained in a list of values

I have an unbalanced panel with the panel id member

I would like to delete particular members from the data set (i.e. in every panel they appear), and would like to delete those specific members that appear in a list/vector of values.

If I have the list of values of member (say 1, 3, 10, 17, 173, 928) I would like a way to drop every observation where the panel id (member) is contained in the list.

The list is ~1500 values long, so rather than manually typing

drop if member == 1
drop if member == 3
drop if member == 10

drop if member == 928

I would like to somehow automate this process.

Upvotes: 0

Views: 64

Answers (2)

Roberto Ferrer
Roberto Ferrer

Reputation: 11102

You do not specify how the list is structured. Please remember to post all details relevant to your problem.

Below two examples.

clear
set more off

*----- case 1 (list in another .dta file) -----

// a hypothetical list
input ///
idcode
1
3
end

list

tempfile mylist
save "`mylist'"

// rest of data
clear
use http://www.stata-press.com/data/r13/union.dta
list if idcode <= 4, sepby(idcode) 

merge m:1 idcode using "`mylist'", keep(master)
list if idcode <= 4, sepby(idcode) 

*----- case 2 (list in a macro) -----

clear
use http://www.stata-press.com/data/r13/union.dta

// a hypothetical list
local mylist 1, 3

drop if inlist(idcode, `mylist')
list if idcode <= 4, sepby(idcode) 

help inlist mentions the following limit:

The number of arguments is between 2 and 255 for reals and between 2 and 10 for strings.

Upvotes: 1

Nick Cox
Nick Cox

Reputation: 37208

@Brendan Cox (namesake, not a relative) has the nub of the matter. To expand a bit:

Note first that

drop if inlist(member,1,3,10,17,173,928)

would be an improvement on your code, but both illegal and impractical for a very large number of values: here 1500 or so certainly qualifies as very large.

At some critical point it becomes a much better idea to put the identifiers in a file and merge. For more on the spirit of this, see http://www.stata.com/support/faqs/data-management/selecting-subset-of-observations/

It's not a paradox that you merge here (temporarily making a bigger dataset) even though you want to make a smaller dataset. merge identifies the intersection of the datasets, which is precisely those observations you wish to drop. merge to create unions of datasets merely happens to be the main and most obvious motive for using the command, but there are others.

Upvotes: 1

Related Questions