Reputation: 446
I would like to keep the first observation using the filter()
function from dplyr
. I mean, I would obtain many rows satisfying the same criterion from filtering but I only want to keep the first one, without further recurring to group()
and distinct()
. Is it possible?
I need to extract from a dataframe the first date stamp and the first date stamp where it appears "Bad".
problem = data.frame(
Status = c("Good", "Good", "Bad", "Bad", "Bad"),
Date_entry = c(as.Date("2000-01-01"), as.Date("2000-01-02"), as.Date("2000-01-03"), as.Date("2000-01-04"),as.Date("2000-01-05")),
Date_status = c(as.Date("1999-01-01"), as.Date("1999-01-01"), as.Date("1999-01-02"), as.Date("1999-01-02"), as.Date("1999-01-02")),
Value = c(150,20,14,96,04))
I can filter(Date == min(Date))
but then I don't know how to exactly filter out the first "Bad" outcome.
I tried filter(Date_entry== min(Date_entry) | (Date_status - Date_entry) == min(Date_status - Date_entry))
but still does not work
solution =
data.frame(Status = c("Good", "Bad"),
Date_entry = c(as.Date("2000-01-01"), as.Date("2000-01-02")),
Date_status = c(as.Date("1999-01-01"), as.Date("1999-01-02")),
Value = c(150,20))
Upvotes: 0
Views: 2912
Reputation: 887991
An option with slice
library(dplyr)
problem %>%
slice(union(which.min(Date_entry), match('Bad', Status)))
-output
# Status Date_entry Date_status Value
#1 Good 2000-01-01 1999-01-01 150
#2 Bad 2000-01-03 1999-01-02 14
Upvotes: 0
Reputation: 206616
I think what you are asking for could be solved with
problem %>%
filter(Date_entry==min(Date_entry) | cumsum(Status=="Bad")==1)
Here we choose the min date, or we choose the first value of Bad using a cumsum
(cumulative sum) trick. This number will go up by one each time a "Bad" is observed so we just select the row where it equals 1 (if present).
Upvotes: 1
Reputation: 1250
Something like this?
library(dplyr)
df <- data.frame(A=c(1,1,1,1,1,2,2,2,2,2),
B=c(1,2,3,4,5,1,2,3,4,5))
head(df %>% filter(A==1),1)
Upvotes: 0