subset data frame based on character value

Question

I'm trying to subset a data frame that I imported with read.table using the colClasses='character' option.

A small sample of the data can be found here

Full99<-read.csv("File.csv",header=TRUE,colClasses='character')

After removing duplicates, missing values, and all unnecessary columns I get a data frame of these dimmensions:

>dim(NoMissNoDup99)
[1] 81551     6

I'm interested in reducing the data to only include observations of a specific Service.Type

I've tried with the subset function:

MU99<-subset(NoMissNoDup99,Service.Type=='Apartment'|
 Service.Type=='Duplex'|
 Service.Type=='Triplex'|
 Service.Type=='Fourplex',
 select=Service.Type:X.13)

 dim(MU99)
[1] 0 6

MU99<-NoMissNoDup99[which(NoMissNoDup99$Service.Type!='Hospital' 
                & NoMissNoDup99$Service.Type!= 'Hotel or Motel'
                & NoMissNoDup99$Service.Type!= 'Industry'
                & NoMissNoDup99$Service.Type!= 'Micellaneous'
                & NoMissNoDup99$Service.Type!= 'Parks & Municipals'
                & NoMissNoDup99$Service.Type!= 'Restaurant'
                & NoMissNoDup99$Service.Type!= 'School or Church or Charity'
                & NoMissNoDup99$Service.Type!='Single Residence'),]

but that doesn't remove observations.

I've tried that same method but slightly tweaked...

MU99<-NoMissNoDup99[which(NoMissNoDup99$Service.Type=='Apartment'
                |NoMissNoDup99$Service.Type=='Duplex'
                |NoMissNoDup99$Service.Type=='Triplex'
                |NoMissNoDup99$Service.Type=='Fourplex'), ]

but that removes every observation...

The final subset should have somewhere around 8000 observations

I'm pretty new to R and Stack Overflow, so I apologize if there's some convention of posting I've neglected to follow, but if anyone has a magic bullet to get this data to cooperate, I'd love your insights :)

cmbarbu · Accepted Answer

The different methods should work if you were using the right variable values. Your issue likely is extra spaces in your variable names.

You can avoid this kind of issues using grep for example:

NoMissNoDup99[grep("Apartment|Duplex|Business",NoMissNoDup99$Service.Type),]

subset data frame based on character value

Answers (2)

Related Questions