Ophelia
Ophelia

Reputation: 29

Deleting multiple Rows based on other Colums Value in R

I hope someone can help me find the right direction for my Problem Let's say we have a data frame like this

year Plant
2009 Monstera
2010 Monstera
2011 Monstera
2012 Monstera
2014 Monstera
2009 Pilea
2010 Pilea
2011 Pilea
2011 Philodendron
2012 Philodendron
2013 Philodendron

I want to remove all rows of a plant if the year starts 2009 but want to stop if one year is skipped the final data frame should look like this

year Plant
2014 Monstera
2011 Philodendron
2012 Philodendron
2013 Philodendron

I the forum I found some information on this problem in excel, however I can't get it to work since I'm an absolute programming and R beginner.

Here are my code Ideas which currently don't work

list1<-list(unique(plants))

For (i in list1){
     if (dataset$year==2009){
     while i 
     -[c(year==2009)]
     ....
 break
  } else {
    ....

I know its not much but I really tried and I hope someone can help

Thank you!

Upvotes: 0

Views: 37

Answers (1)

Ben
Ben

Reputation: 30474

If I understand the logic correctly, you could try this approach.

Using the dplyr package, put your dataset into groups, based on the Plant as well as consecutive years (where there is a difference of 1 year between rows, such as 2009, 2010, 2011...).

Then, keep or filter the rows of data where the first year of each group is not 2009.

The final ungroup and select will remove the made-up Group column so your results only include year and Plant.

library(dplyr)

dataset %>%
  group_by(Plant, Group = c(0, cumsum(diff(year) != 1))) %>%
  filter(first(year) != 2009) %>%
  ungroup() %>%
  select(-Group)

Output

   year Plant       
  <int> <chr>       
1  2014 Monstera    
2  2011 Philodendron
3  2012 Philodendron
4  2013 Philodendron

Upvotes: 0

Related Questions