Reputation: 1078
I have a data.table with 47 variables looking at 5007 PhD student outcomes that looks something like this
sample<-data.table(PHD_STUDENT_ID=c(101:120),STUDY_LOCATION=c("Sydney","Canberra","Sydney","Sydney",
"Malaysia", "Malaysia", "CLF", "DRR", "GHS", "HMS", "DRJD", "KLS", "Malaysia",
"Singapore", "Melbourne", "RD3S", "South Africa", "RME", "Sydney", "Canberra"),
GRADE=c(51:70))
So the data.table looks something like this
PHD_STUDENT_ID STUDY_LOCATION GRADE
1 101 Sydney 51
2 102 Canberra 52
3 103 Sydney 53
4 104 Sydney 54
5 105 Malaysia 55
6 106 Malaysia 56
7 107 CLF 57
8 108 DRR 58
.........
I need to retain all the rows except for the rows where the Study location is "Malaysia", "South Africa" or "Singapore". So basically all the values that are not at the Campuses in those countries. I have hundreds of unique values where the study location is just a code for a lab eg "CLF" and "DRR" which I want to retain so I can't just subset by Australia cities.
Any advice on how to subset this data table by reference to the values in STUDY_LOCATION are not "Malaysia", "South Africa" or "Singapore" would be greatly appreciated.
Upvotes: 3
Views: 2155
Reputation: 3501
I assume you're learning data.table. Thus a data.table way is
setkey(sample, STUDY_LOCATION)
sample[!c('Malaysia', 'South Africa', 'Singapore')]
Upvotes: 3
Reputation: 886968
You could try
sample[!STUDY_LOCATION %in% c('Malaysia', 'South Africa', 'Singapore')]
Upvotes: 3