Reputation: 2429
In an experiment, I'm trying to find the time to first birth. There are four animals as given by id and rep (A1, A2, B1, B2), their age and babies. For each id and rep, I want to only retain the rows where babies were first born
id <- c("A","A","A","A","A","A","B","B","B","B","B","B","B","B","B")
rep <- c(1,1,1,2,2,2,1,1,1,1,2,2,2,2,2)
age <- c(0,1,2,0,1,2,0,1,2,3,0,1,2,3,4)
babies <- c(0,0,1,0,1,0,0,0,0,1,0,0,0,1,1)
df <- data.frame(id,rep,age,babies)
So in here, the final dataframe should look like this
id <- c("A","A","B","B")
rep <- c(1,2,1,2)
age <- c(2,1,3,3)
babies <- c(1,1,1,1)
df <- data.frame(id,rep,age,babies)
Upvotes: 6
Views: 89
Reputation: 3293
You only need to group_by
and filter
:
df %>%
group_by(id, rep) %>%
filter(babies > 0) %>%
filter(age == min(age)) %>%
ungroup()
Upvotes: 3
Reputation: 1252
An alternative
df |>
group_by(id,rep) |>
slice(which(c(0, diff(babies)) == 1)) |>
ungroup()
This accounts for an individual having more babies as they age
Upvotes: 1
Reputation: 79132
Here is one with arrange
:
library(dplyr)
df %>%
group_by(id, rep) %>%
arrange(-babies, .by_group = TRUE) %>%
slice(1)
id rep age babies
<chr> <dbl> <dbl> <dbl>
1 A 1 2 1
2 A 2 1 1
3 B 1 3 1
4 B 2 3 1
Upvotes: 2
Reputation: 887501
library(dplyr)
df %>%
group_by(id, rep) %>%
slice_max(babies, n = 1, with_ties = FALSE) %>%
ungroup
df %>%
group_by(id, rep) %>%
filter(row_number() == which(babies > 0)[1]) %>%
ungroup
Upvotes: 4