Reputation: 666
I just started working on a massive dataset with 5 million observations and lots and lots of variables. To process this faster, I want to select only some variables of interest and drop the rest.
with keep
, I could select a block of variables, very simple:
keep varx1-x5
However, the variables I want are not in order in the dataset:
varx1 varx2 varx3 varz1 varz2 vary1 vary2 vary3
Where I don't want the varz
variables. I want only the blocks with varx
and vary
.
So. I'm not very good at loops, but I tried this:
foreach varname of varlist varx1-varx3 vary1-vary3 {
keep `varname'
}
This doesn't work, because it keep
s only varx1
, then tries to keep
the others, and errors out because they have just been drop
ped.
How can I tell keep
to select multiple blocks of variables?
Upvotes: 0
Views: 4556
Reputation: 903
If you don't know all the variables you want to drop, to keep only the blocks with varx
and vary
:
keep varx* varz*
The *
means “match zero or more” of the preceding expression.
Upvotes: 1
Reputation: 11102
Rather than using keep
which will wipe out variables not given to the command, try drop
, which will delete only those you specify. The loop is not necessary. An example:
clear
set obs 0
*----- example vars -----
gen varx1 = .
gen varx2 = .
gen varx3 = .
gen varz1 = .
gen varz2 = .
gen vary1 = .
gen vary2 = .
gen vary3 = .
*----- what you want -----
drop varz*
Both commands are documented jointly, so help keep
or help drop
would have gotten you there.
Upvotes: 2