Mon
Mon

Reputation: 281

Delete rows in data frame if entry appears fewer than x times

I have the following data frame, call it df, which is a data frame consisting in three vectors: "Name," "Age," and "ZipCode."

df=      
  Name Age ZipCode
1  Joe  16   60559
2  Jim  20   60637
3  Bob  64   94127
4  Joe  23   94122
5  Bob  45   25462

I want to delete the entire row of df if the Name in it appears fewer than 2 times in the data frame as a whole (and flexibly 3, 4, or x times). Basically keep Bob and Joe in the data frame, but delete Jim. How can I do this?

I tried to turn it into a table:

> table(df$Name)

Bob Jim Joe 
 2   1   2 

But I don't know where to go from there.

Upvotes: 7

Views: 2431

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

You can use ave like this:

df[as.numeric(ave(df$Name, df$Name, FUN=length)) >= 2, ]
#   Name Age ZipCode
# 1  Joe  16   60559
# 3  Bob  64   94127
# 4  Joe  23   94122
# 5  Bob  45   25462

This answer assumes that df$Name is a character vector, not a factor vector.


You can also continue with table as follows:

x <- table(df$Name)
df[df$Name %in% names(x[x >= 2]), ]
#   Name Age ZipCode
# 1  Joe  16   60559
# 3  Bob  64   94127
# 4  Joe  23   94122
# 5  Bob  45   25462

Upvotes: 8

Related Questions