Polo
Polo

Reputation: 41

Remove row conditionally from data frame

How can I remove rows conditionally from a data table?

For example, I have:

Apple, 2001
Apple, 2002
Apple, 2003
Apple, 2004
Banana, 2001
Banana, 2002
Banana, 2003
Candy, 2001
Candy, 2002
Candy, 2003
Candy, 2004
Dog, 2001
Dog, 2002
Dog, 2004
Water, 2002
Water, 2003
Water, 2004

Then, I want to include only the rows with 2001-2004 per group, i.e.:

Apple, 2001
Apple, 2002
Apple, 2003
Apple, 2004
Candy, 2001
Candy, 2002
Candy, 2003
Candy, 2004

Upvotes: 3

Views: 301

Answers (3)

pe-perry
pe-perry

Reputation: 2631

dplyr approach:

library(dplyr) # or library(tidyverse)
df1 %>% 
    group_by(Col1) %>% 
    filter(all(2001:2004 %in% year))

. %>% filter(TRUE) returns all rows, while . %>% filter(FALSE) drops all rows of data.

Output:

Source: local data frame [8 x 2]
Groups: Col1 [2]

   Col1  year
  <chr> <int>
1 Apple  2001
2 Apple  2002
3 Apple  2003
4 Apple  2004
5 Candy  2001
6 Candy  2002
7 Candy  2003
8 Candy  2004

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 389325

With base R we can use ave to get the desired results

df[ave(df$year, df$Col1, FUN = function(x) all(2001:2004 %in% x)) == 1, ]

#   Col1 year
#1  Apple 2001
#2  Apple 2002
#3  Apple 2003
#4  Apple 2004
#8  Candy 2001
#9  Candy 2002
#10 Candy 2003
#11 Candy 2004

Upvotes: 2

akrun
akrun

Reputation: 887971

Using data.table, check if all the 2001:2004 are present %in% the 'year' column for each group of 'Col1', then get the Subset of Data.table

library(data.table)
setDT(df1)[, if(all(2001:2004 %in% year)) .SD, by = Col1]
#    Col1 year
#1: Apple 2001
#2: Apple 2002
#3: Apple 2003
#4: Apple 2004
#5: Candy 2001
#6: Candy 2002
#7: Candy 2003
#8: Candy 2004

data

df1 <- structure(list(Col1 = c("Apple", "Apple", "Apple", "Apple", "Banana", 
"Banana", "Banana", "Candy", "Candy", "Candy", "Candy", "Dog", 
"Dog", "Dog", "Water", "Water", "Water"), year = c(2001L, 2002L, 
 2003L, 2004L, 2001L, 2002L, 2003L, 2001L, 2002L, 2003L, 2004L, 
 2001L, 2002L, 2004L, 2002L, 2003L, 2004L)), .Names = c("Col1", 
 "year"), class = "data.frame", row.names = c(NA, -17L))

Upvotes: 3

Related Questions