Remove rows in dataframe, which are not consecutive

Question

My question is related to my other question but it's not quite the same.

Assuming I have this:

DOY  <-c (1:30) # Day of Year
rain <-c (0,0,0,0,0,0,10,0,0,20,10,5,0,0,0,0,0,0,0,5,20,45,5,2,0,0,0,10,0,0)
DF <- data.frame(DOY=DOY, rain=rain)
# Just interested in days with rain
DF <- DF[DF$rain > 0,] 
# assign ID for each rainfall event, but only if Difference in days is >4
DF$ID <-c (1,1+cumsum(diff(DF$DOY) > 4)) 

# Rerranging, so I have the highest value at top for each group
DF <- DF %>%
  group_by(ID) %>%
  arrange(desc(rain),.by_group=TRUE)

My DF looks like this:

DOY rain    ID
10    20     1
 7    10     1
11    10     1
 9     9     1
12     5     1
22    45     2
21    20     2
28    10     2
20     5     2
23     5     2
24     2     2

But only want days which are consecutive from the day of max value. My expected output should then be:

DOY rain    ID
10    20     1
11    10     1
 9     9     1
12     5     1
22    45     2
21    20     2
20     5     2
23     5     2
24     2     2

Rows with DOY = 7 and DOY 28 should be removed in my case, all other DOYs are kept.

Dan Chaltiel · Accepted Answer

I think you can reuse the answer from your other question here (which was quite a neat trick BTW).

DF %>%
  group_by(ID) %>%
  arrange(DOY,.by_group=TRUE) %>%
  mutate(
    max_rain = max(rain),
    i=cumsum(c(TRUE,diff(DOY)>1))
  ) %>%
  group_by(i, ID) %>%
  filter(any(rain==max_rain))

Basically, for each ID, you save the maximal rain and you create another ordering group ID (i here). You have to arrange by DOY for this to work.

Then, you just need to group by both ID and filter out all groups that do not contain the max rain value.

Remove rows in dataframe, which are not consecutive

Answers (2)

Related Questions