emilBeBri
emilBeBri

Reputation: 666

Using Stata's keep command on multiple blocks of variables

I just started working on a massive dataset with 5 million observations and lots and lots of variables. To process this faster, I want to select only some variables of interest and drop the rest.

with keep, I could select a block of variables, very simple:

keep varx1-x5 

However, the variables I want are not in order in the dataset:

varx1 varx2 varx3 varz1 varz2 vary1 vary2 vary3

Where I don't want the varz variables. I want only the blocks with varx and vary.

So. I'm not very good at loops, but I tried this:

foreach varname of varlist varx1-varx3 vary1-vary3  {
keep `varname'
}

This doesn't work, because it keeps only varx1, then tries to keep the others, and errors out because they have just been dropped.

How can I tell keep to select multiple blocks of variables?

Upvotes: 0

Views: 4556

Answers (2)

GPierre
GPierre

Reputation: 903

If you don't know all the variables you want to drop, to keep only the blocks with varx and vary :

keep varx* varz*

The * means “match zero or more” of the preceding expression.

Upvotes: 1

Roberto Ferrer
Roberto Ferrer

Reputation: 11102

Rather than using keep which will wipe out variables not given to the command, try drop, which will delete only those you specify. The loop is not necessary. An example:

clear 
set obs 0

*----- example vars -----

gen varx1 = .
gen varx2 = .
gen varx3 = .
gen varz1 = .
gen varz2 = .
gen vary1 = .
gen vary2 = .
gen vary3 = .

*----- what you want -----

drop varz*

Both commands are documented jointly, so help keep or help drop would have gotten you there.

Upvotes: 2

Related Questions