James Harrington
James Harrington

Reputation: 103

Finding value of the next row that satisfies a condition in R

I have this dataframe:

    plans     date                    expiring_date
1   premium   2020-07-12 20:26:54     2020-08-11 20:26:54
2   premium   2020-08-15 00:11:54     2020-09-14 00:11:54
3   premium   2020-09-14 19:59:25     2020-10-14 19:59:25
4   single    2020-09-14 20:02:41     NA
5   double    2020-09-17 16:31:07     NA
6   double    2020-10-07 09:54:38     NA
7   premium   2020-10-15 12:06:57     2020-11-14 12:06:57
8   double    2020-10-15 12:08:46     NA
9   single    2020-10-15 12:13:59     NA

which is produced by this:

df <- data.frame(plans = c("premium", "premium", "premium", "single", "double", "double", "premium", "double", "single"),
                 date = as.POSIXct(c("2020-07-12 20:26:54", "2020-08-15 00:11:54", "2020-09-14 19:59:25",
                                     "2020-09-14 20:02:41", "2020-09-17 16:31:07", "2020-10-07 09:54:38",
                                     "2020-10-15 12:06:57", "2020-10-15 12:08:46", "2020-10-15 12:13:59")),
                 expiring_date = as.POSIXct(c("2020-08-11 20:26:54", "2020-09-14 00:11:54", "2020-10-14 19:59:25",
                                              NA, NA, NA, "2020-11-14 12:06:57", NA, NA)))

What I want is to add a new column that gets the date of the next row with a premium plan for the rows that have a premium plan. So the resulting dataframe will be this:

    plans     date                    expiring_date          next_premium_date
1   premium   2020-07-12 20:26:54     2020-08-11 20:26:54    2020-08-15 00:11:54
2   premium   2020-08-15 00:11:54     2020-09-14 00:11:54    2020-09-14 19:59:25
3   premium   2020-09-14 19:59:25     2020-10-14 19:59:25    2020-10-15 12:06:57
4   single    2020-09-14 20:02:41     NA                     NA
5   double    2020-09-17 16:31:07     NA                     NA
6   double    2020-10-07 09:54:38     NA                     NA
7   premium   2020-10-15 12:06:57     2020-11-14 12:06:57    NA
8   double    2020-10-15 12:08:46     NA                     NA
9   single    2020-10-15 12:13:59     NA                     NA

I am trying with dplyr library because I want to use the group_by function since I have multiple users and I want to do this for every user, but no result so far. Any ideas?

Upvotes: 1

Views: 931

Answers (2)

Benjamin Christoffersen
Benjamin Christoffersen

Reputation: 4841

Here is a base R solution:

res <- merge(
  df, transform(subset(df, plans == "premium", "date"), 
                next_premium_date = c(date[-1], NA)), 
  all.x = TRUE)

# get the old column order
res[, c("plans", "date", "expiring_date", "next_premium_date")]
#R>     plans                date       expiring_date   next_premium_date
#R> 1 premium 2020-07-12 20:26:54 2020-08-11 20:26:54 2020-08-15 00:11:54
#R> 2 premium 2020-08-15 00:11:54 2020-09-14 00:11:54 2020-09-14 19:59:25
#R> 3 premium 2020-09-14 19:59:25 2020-10-14 19:59:25 2020-10-15 12:06:57
#R> 4  single 2020-09-14 20:02:41                <NA>                <NA>
#R> 5  double 2020-09-17 16:31:07                <NA>                <NA>
#R> 6  double 2020-10-07 09:54:38                <NA>                <NA>
#R> 7 premium 2020-10-15 12:06:57 2020-11-14 12:06:57                <NA>
#R> 8  double 2020-10-15 12:08:46                <NA>                <NA>
#R> 9  single 2020-10-15 12:13:59                <NA>                <NA>

I am trying with dplyr library because I want to use the group_by function since I have multiple users and I want to do this for every user, but no result so far.

Here you can also use the, somewhat less user-friendly, by function in base R.

Upvotes: 0

Karthik S
Karthik S

Reputation: 11584

Does this work:

> library(dplyr)
> df %>% filter(plans == 'premium') %>% mutate(next_premium_date = lead(date)) %>% 
+   right_join(df) %>% arrange(date)
Joining, by = c("plans", "date", "expiring_date")
    plans                date       expiring_date   next_premium_date
1 premium 2020-07-12 20:26:54 2020-08-11 20:26:54 2020-08-15 00:11:54
2 premium 2020-08-15 00:11:54 2020-09-14 00:11:54 2020-09-14 19:59:25
3 premium 2020-09-14 19:59:25 2020-10-14 19:59:25 2020-10-15 12:06:57
4  single 2020-09-14 20:02:41                <NA>                <NA>
5  double 2020-09-17 16:31:07                <NA>                <NA>
6  double 2020-10-07 09:54:38                <NA>                <NA>
7 premium 2020-10-15 12:06:57 2020-11-14 12:06:57                <NA>
8  double 2020-10-15 12:08:46                <NA>                <NA>
9  single 2020-10-15 12:13:59                <NA>                <NA>

Upvotes: 1

Related Questions