Reputation: 47
Using dplyr
on a data frame of population sizes over time, I'd like to identify the set of time points at which the subpopulations first exceed zero, and also the corresponding set of previous time points (i.e. the latest times before subpopulations exceed zero). I can find the first set of time points as follows:
df <- data.frame(time = rep(1:4, each = 3),
id = rep(letters[1:3], times = 4),
population = c(1, 0, 0, 2, 1, 0, 0, 2, 1, 0, 0, 0))
first_gens <- group_by_(df, ~id) %>%
filter_(~population > 0) %>%
summarise_(start_time = ~min(time)) %>%
ungroup()
In this example, the first time points for subpopulations a, b and c are respectively 1, 2 and 3.
What I can't figure out is an easy way to find the previous time points. In this example, the previous time points for subpopulations a, b and c should be respectively NA, 1 and 2 (dealing with the NA case is unimportant as I can filter out such cases).
Edit: I want a solution that works for an arbitrary sequence of time points.
Any help would be much appreciated.
(NB: I'm using "_" forms of dplyr
functions to satisfy CRAN package requirements.)
Upvotes: 1
Views: 42
Reputation: 7830
You can use lag
df %>%
group_by(id) %>%
summarize(min(time[population > 0]),
lag(time)[min(which(population > 0))])
> df %>%
+ group_by(id) %>%
+ summarize(min(time[population > 0]),
+ lag(time)[min(which(population > 0))])
# A tibble: 3 x 3
id `min(time[which(population > 0)])` `lag(time)[min(which(population > 0))]`
<fct> <int> <int>
1 a 1 NA
2 b 2 1
3 c
Upvotes: 1