Chris
Chris

Reputation: 9509

Read rows with specific column values

I want to extract a set of rows of an existing dataset:

 dataset.x <- dataset[(as.character(dataset$type))=="x",]

however when I run

   summary(dataset.x$type)

It displays all types which were present in the original dataset. Basically I get a result that says

   x 12354235    #the correct itemcount
   y 0
   z 0
   a 0
   ...

Not only is the presence of 0 elements ugly but it also messes up any plot of dataset.x due to the presence of hundrets of entries with the value 0.

Upvotes: 0

Views: 1572

Answers (4)

Greg Snow
Greg Snow

Reputation: 49640

Others have explained what is happening and how to fix it, I just want to show why it is a desirable default.

Consider the following sample code:

mydata <- data.frame( 
    x = factor( rep( c(0:5,0:5), c(0,5,10,20,10,5,5,10,20,10,5,0))),
    sex = rep( c('F','M'), each=50 ) )

mydata.males <- mydata[ mydata$sex=='M', ]
mydata.males.dropped <- droplevels(mydata.males)

mydata.females <- mydata[ mydata$sex=='F', ]
mydata.females.dropped <- droplevels(mydata.females)

par(mfcol=c(2,2))
barplot(table(mydata.males$x), main='Male', sub='Default')
barplot(table(mydata.females$x), main='Female', sub='Default')

barplot(table(mydata.males.dropped$x), main='Male', sub='Drop')
barplot(table(mydata.females.dropped$x), main='Female', sub='Drop')

Which produces this plot:

enter image description here

Now, which is the more meaningful comparison, the 2 plots on the left? or the 2 plots on the right?

Instead of dropping unused levels it may be better to rethink what you are doing. If the main goal is to get the count of the x's then you can use sum rather than subsetting and getting the summary. And how meaningful can a plot be on a variable that you have already forced to be a single value?

Upvotes: 3

joran
joran

Reputation: 173527

Building on Chase's answer, subsetting and dropping unused levels in factors comes up a lot, so it pays to just create your own function by combining droplevels and subset:

subsetDrop <- function(...){droplevels(subset(...))}

Upvotes: 3

Rguy
Rguy

Reputation: 1652

Try

dataset$type <- as.character(dataset$type)

followed by your original code. It's probably just that R is still treating that column as a factor and is keeping all of the information about that factor in the column.

Upvotes: 1

Chase
Chase

Reputation: 69151

I'm assuming this is a factor? If so, droplevels() can be used: http://stat.ethz.ch/R-manual/R-patched/library/base/html/droplevels.html

If you add a small reproducible example, it will help others get on the same page and give better advice if this isn't right.

Upvotes: 3

Related Questions