Date difference within groups based on condition

Question

I have the following data. df is my starting point. df2 is what I want to achieve.

df

    "ID"    "event" "time"       "group"  
    1        FALSE   1990-01-01   1
    2        FALSE   1990-01-02   1
    3        FALSE   1990-01-03   1
    4        TRUE    1990-01-04   1
    5        FALSE   1990-01-02   2
    6        TRUE    1990-01-03   2
    7        FALSE   1990-01-03   3

df2 (what I need)

    "ID"    "event" "time"       "group"  distance
    1        FALSE   1990-01-01   1       3
    2        FALSE   1990-01-02   1       2
    3        FALSE   1990-01-03   1       1 
    4        TRUE    1990-01-04   1       0 
    5        FALSE   1990-01-02   2       1
    6        TRUE    1990-01-03   2       0
    7        FALSE   1990-01-03   3       NA

I need the date-difference (distance column) between any observation within a group to the observation which fulfills the event=TRUE condition. If there is no event occuring in the group a NA should be put in place of the distance indicator.

KenHBS · Accepted Answer

You could first get a data frame only with those rows in df that have df$event = TRUE. Then merge that smaller data frame with df to get a bigger merged_df, with the additional column merged_df$time.y telling you the date of the last event=TRUE occurrence for that group:

merged_df <- merge(x = df, y = df[df$event == TRUE, ], 
                  by = "group", all.x = TRUE)
df$distance <- ( merged_df$time.y - df$time )
df
#  ID event       time group distance
#1  1 FALSE 1990-01-01     1   3 days
#2  2 FALSE 1990-01-02     1   2 days
#3  3 FALSE 1990-01-03     1   1 days
#4  4  TRUE 1990-01-04     1   0 days
#5  5 FALSE 1990-01-02     2   1 days
#6  6  TRUE 1990-01-03     2   0 days
#7  7 FALSE 1990-01-03     3  NA days

Data:

df <- structure(list(ID = 1:7, event = c(FALSE, FALSE, FALSE, TRUE, 
           FALSE, TRUE, FALSE), time = structure(c(7305, 7306, 7307, 7308, 
           7306, 7307, 7307), class = "Date"), group = c(1L, 1L, 1L, 1L, 
           2L, 2L, 3L)), .Names = c("ID", "event", "time", "group"), row.names = c(NA, 
           -7L), class = "data.frame")

Date difference within groups based on condition

Answers (1)

Related Questions