Alessandro Jacopson
Alessandro Jacopson

Reputation: 18613

Filtering a data frame in R and an unwanted filtered out result

This snippet:

names<-c("Alice","Bob","Charlie")
ages<-c(25,24,25)
friends<-data.frame(names,ages)
a25 <- friends[friends$age==25,]
a25
table(a25$names)

gives me this output

    names ages
1   Alice   25
3 Charlie   25

  Alice     Bob Charlie 
      1       0       1

Now, why "Bob" is in the output since the data frame a25 does not include "Bob"? I would expected an output like this (from the table command):

  Alice  Charlie 
      1        1 

What am I missing?

My environment:

R version 2.15.2 (2012-10-26)
Platform: i386-w64-mingw32/i386 (32-bit)

Upvotes: 1

Views: 186

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193527

This question appears to have an answer in the comments. This answer shares one additional approach and consolidates the suggestions from the comments.

The problem you describe is as follows: There is no "Bob" in your "a25$names" variable, but when you use table, "Bob" shows up. This is because the levels present in the original column have been retained.

table(a25$names)
# 
#   Alice     Bob Charlie 
#       1       0       1 

Fortunately, there's a function called droplevels that takes care of situations like this:

table(droplevels(a25$names))
# 
#   Alice Charlie 
#       1       1 

The droplevels function can work on a data.frame too, allowing you to do the following:

a25alt <- droplevels(friends[friends$ages==25,])
a25alt
#     names ages
# 1   Alice   25
# 3 Charlie   25
table(a25alt$names)
# 
#   Alice Charlie 
#       1       1 

As mentioned in the comments, also look at as.character and factor:

table(as.character(a25$names))
table(factor(a25$names))

Upvotes: 1

Related Questions