Reputation: 3538
I a have tibble with three columns:
I would like to add a fourth column, last45d, that represents the number of races that have been run in the last 45 days compared to the date of the current row. My reprex includes sample data and my attempt at producing the new row (i get all NAs).
reprex:
library(tidyverse)
library(lubridate)
library(reprex)
df<-tibble(runner=c("D.Wottle","D.Wottle","D.Wottle","D.Wottle","D.Wottle","D.Wottle","C.Hottle","C.Hottle","C.Hottle","C.Hottle","C.Hottle","C.Hottle","JJ.Watt","JJ.Watt","JJ.Watt","JJ.Watt","JJ.Watt","JJ.Watt"),
race=c(6,5,4,3,2,1,6,5,4,3,2,1,6,5,4,3,2,1),
date=c(ymd('20170625'),ymd('20170524'),ymd('20170420'),ymd('20170329'),ymd('20170308'),ymd('20170215'),ymd('20170625'),ymd('20170524'),ymd('20170410'),ymd('20170329'),ymd('20170304'),ymd('20170215'),ymd('20170615'),ymd('20170524'),ymd('20170428'),ymd('20170329'),ymd('20170301'),ymd('20170225')),
surface=c('T','T','D','T','D','T','T','T','D','T','D','T','T','T','D','T','D','T'),
distance=c(1400,1400,1600,1400,1500,1400,1400,1400,1600,1400,1500,1400,1400,1400,1600,1400,1500,1400),
finish=c(1,2,2,1,2,3,2,3,3,2,1,1,3,1,1,3,3,2)
)
df <- df %>%
group_by(runner) %>%
mutate(last45 = map_int(date, ~ sum(between(as.numeric(difftime(.x, date, units = "days")), 1e-9, 90)))) %>%
ungroup()
df
#> # A tibble: 18 x 7
#> runner race date surface distance finish last45
#> <chr> <dbl> <date> <chr> <dbl> <dbl> <int>
#> 1 D.Wottle 6 2017-06-25 T 1400 1 3
#> 2 D.Wottle 5 2017-05-24 T 1400 2 3
#> 3 D.Wottle 4 2017-04-20 D 1600 2 3
#> 4 D.Wottle 3 2017-03-29 T 1400 1 2
#> 5 D.Wottle 2 2017-03-08 D 1500 2 1
#> 6 D.Wottle 1 2017-02-15 T 1400 3 0
#> 7 C.Hottle 6 2017-06-25 T 1400 2 3
#> 8 C.Hottle 5 2017-05-24 T 1400 3 3
#> 9 C.Hottle 4 2017-04-10 D 1600 3 3
#> 10 C.Hottle 3 2017-03-29 T 1400 2 2
#> 11 C.Hottle 2 2017-03-04 D 1500 1 1
#> 12 C.Hottle 1 2017-02-15 T 1400 1 0
#> 13 JJ.Watt 6 2017-06-15 T 1400 3 3
#> 14 JJ.Watt 5 2017-05-24 T 1400 1 4
#> 15 JJ.Watt 4 2017-04-28 D 1600 1 3
#> 16 JJ.Watt 3 2017-03-29 T 1400 3 2
#> 17 JJ.Watt 2 2017-03-01 D 1500 3 1
#> 18 JJ.Watt 1 2017-02-25 T 1400 2 0
Created on 2020-05-13 by the reprex package (v0.3.0)
This is what I'd like the end result to look like:
Upvotes: 3
Views: 86
Reputation: 160447
df %>%
group_by(runner) %>%
mutate(
last45 = map_int(date, ~ sum(between(as.numeric(difftime(.x, date, units = "days")), 1e-9, 45)))
# ^^^^1 ^^^^2
) %>%
ungroup()
# # A tibble: 6 x 4
# runner race date last45
# <chr> <dbl> <date> <int>
# 1 D.Wottle 6 2017-06-25 1
# 2 D.Wottle 5 2017-05-24 1
# 3 D.Wottle 4 2017-04-20 2
# 4 D.Wottle 3 2017-03-29 2
# 5 D.Wottle 2 2017-03-08 1
# 6 D.Wottle 1 2017-02-15 0
Notes:
the two references to date
are different: "1" (outside the tilde-function) is transferred into .x
one at a time, so .x
will always be a single date; "2" (inside the tilde-function) is the original column of dates, so will have as many values as the current runner has rows; and
I use 1e-9
because if I use 0
, then the current day is always considered; by using 1e-9
(or some equally small number), we get the effective (lower,upper]
bounds, vice the dplyr::between
default of [lower,upper]
(both sides closed).
Upvotes: 4