Reputation: 103
I have this dataframe:
plans date expiring_date
1 premium 2020-07-12 20:26:54 2020-08-11 20:26:54
2 premium 2020-08-15 00:11:54 2020-09-14 00:11:54
3 premium 2020-09-14 19:59:25 2020-10-14 19:59:25
4 single 2020-09-14 20:02:41 NA
5 double 2020-09-17 16:31:07 NA
6 double 2020-10-07 09:54:38 NA
7 premium 2020-10-15 12:06:57 2020-11-14 12:06:57
8 double 2020-10-15 12:08:46 NA
9 single 2020-10-15 12:13:59 NA
which is produced by this:
df <- data.frame(plans = c("premium", "premium", "premium", "single", "double", "double", "premium", "double", "single"),
date = as.POSIXct(c("2020-07-12 20:26:54", "2020-08-15 00:11:54", "2020-09-14 19:59:25",
"2020-09-14 20:02:41", "2020-09-17 16:31:07", "2020-10-07 09:54:38",
"2020-10-15 12:06:57", "2020-10-15 12:08:46", "2020-10-15 12:13:59")),
expiring_date = as.POSIXct(c("2020-08-11 20:26:54", "2020-09-14 00:11:54", "2020-10-14 19:59:25",
NA, NA, NA, "2020-11-14 12:06:57", NA, NA)))
What I want is to add a new column that gets the date of the next row with a premium plan for the rows that have a premium plan. So the resulting dataframe will be this:
plans date expiring_date next_premium_date
1 premium 2020-07-12 20:26:54 2020-08-11 20:26:54 2020-08-15 00:11:54
2 premium 2020-08-15 00:11:54 2020-09-14 00:11:54 2020-09-14 19:59:25
3 premium 2020-09-14 19:59:25 2020-10-14 19:59:25 2020-10-15 12:06:57
4 single 2020-09-14 20:02:41 NA NA
5 double 2020-09-17 16:31:07 NA NA
6 double 2020-10-07 09:54:38 NA NA
7 premium 2020-10-15 12:06:57 2020-11-14 12:06:57 NA
8 double 2020-10-15 12:08:46 NA NA
9 single 2020-10-15 12:13:59 NA NA
I am trying with dplyr
library because I want to use the group_by
function since I have multiple users and I want to do this for every user, but no result so far. Any ideas?
Upvotes: 1
Views: 931
Reputation: 4841
Here is a base R solution:
res <- merge(
df, transform(subset(df, plans == "premium", "date"),
next_premium_date = c(date[-1], NA)),
all.x = TRUE)
# get the old column order
res[, c("plans", "date", "expiring_date", "next_premium_date")]
#R> plans date expiring_date next_premium_date
#R> 1 premium 2020-07-12 20:26:54 2020-08-11 20:26:54 2020-08-15 00:11:54
#R> 2 premium 2020-08-15 00:11:54 2020-09-14 00:11:54 2020-09-14 19:59:25
#R> 3 premium 2020-09-14 19:59:25 2020-10-14 19:59:25 2020-10-15 12:06:57
#R> 4 single 2020-09-14 20:02:41 <NA> <NA>
#R> 5 double 2020-09-17 16:31:07 <NA> <NA>
#R> 6 double 2020-10-07 09:54:38 <NA> <NA>
#R> 7 premium 2020-10-15 12:06:57 2020-11-14 12:06:57 <NA>
#R> 8 double 2020-10-15 12:08:46 <NA> <NA>
#R> 9 single 2020-10-15 12:13:59 <NA> <NA>
I am trying with
dplyr
library because I want to use thegroup_by
function since I have multiple users and I want to do this for every user, but no result so far.
Here you can also use the, somewhat less user-friendly, by
function in base R.
Upvotes: 0
Reputation: 11584
Does this work:
> library(dplyr)
> df %>% filter(plans == 'premium') %>% mutate(next_premium_date = lead(date)) %>%
+ right_join(df) %>% arrange(date)
Joining, by = c("plans", "date", "expiring_date")
plans date expiring_date next_premium_date
1 premium 2020-07-12 20:26:54 2020-08-11 20:26:54 2020-08-15 00:11:54
2 premium 2020-08-15 00:11:54 2020-09-14 00:11:54 2020-09-14 19:59:25
3 premium 2020-09-14 19:59:25 2020-10-14 19:59:25 2020-10-15 12:06:57
4 single 2020-09-14 20:02:41 <NA> <NA>
5 double 2020-09-17 16:31:07 <NA> <NA>
6 double 2020-10-07 09:54:38 <NA> <NA>
7 premium 2020-10-15 12:06:57 2020-11-14 12:06:57 <NA>
8 double 2020-10-15 12:08:46 <NA> <NA>
9 single 2020-10-15 12:13:59 <NA> <NA>
Upvotes: 1