slap-a-da-bias
slap-a-da-bias

Reputation: 406

subset data frame based on character value

I'm trying to subset a data frame that I imported with read.table using the colClasses='character' option.

A small sample of the data can be found here

Full99<-read.csv("File.csv",header=TRUE,colClasses='character')

After removing duplicates, missing values, and all unnecessary columns I get a data frame of these dimmensions:

>dim(NoMissNoDup99)
[1] 81551     6

I'm interested in reducing the data to only include observations of a specific Service.Type

I've tried with the subset function:

MU99<-subset(NoMissNoDup99,Service.Type=='Apartment'|
 Service.Type=='Duplex'|
 Service.Type=='Triplex'|
 Service.Type=='Fourplex',
 select=Service.Type:X.13)

 dim(MU99)
[1] 0 6

MU99<-NoMissNoDup99[which(NoMissNoDup99$Service.Type!='Hospital' 
                & NoMissNoDup99$Service.Type!= 'Hotel or Motel'
                & NoMissNoDup99$Service.Type!= 'Industry'
                & NoMissNoDup99$Service.Type!= 'Micellaneous'
                & NoMissNoDup99$Service.Type!= 'Parks & Municipals'
                & NoMissNoDup99$Service.Type!= 'Restaurant'
                & NoMissNoDup99$Service.Type!= 'School or Church or Charity'
                & NoMissNoDup99$Service.Type!='Single Residence'),]

but that doesn't remove observations.

I've tried that same method but slightly tweaked...

MU99<-NoMissNoDup99[which(NoMissNoDup99$Service.Type=='Apartment'
                |NoMissNoDup99$Service.Type=='Duplex'
                |NoMissNoDup99$Service.Type=='Triplex'
                |NoMissNoDup99$Service.Type=='Fourplex'), ]

but that removes every observation...

The final subset should have somewhere around 8000 observations

I'm pretty new to R and Stack Overflow, so I apologize if there's some convention of posting I've neglected to follow, but if anyone has a magic bullet to get this data to cooperate, I'd love your insights :)

Upvotes: 1

Views: 12030

Answers (2)

Amrita Sawant
Amrita Sawant

Reputation: 10913

    ## exclude 
    MU99<-subset(NoMissNoDup99,!(Service.Type %in% c('Hospital','Hotel or Motel')))

    ##include
    MU99<-subset(NoMissNoDup99,Service.Type %in% c('Apartment','Duplex'))

Upvotes: 1

cmbarbu
cmbarbu

Reputation: 4534

The different methods should work if you were using the right variable values. Your issue likely is extra spaces in your variable names.

You can avoid this kind of issues using grep for example:

NoMissNoDup99[grep("Apartment|Duplex|Business",NoMissNoDup99$Service.Type),]

Upvotes: 1

Related Questions