Reputation: 1447
I have a dataframe that summarizes the number of times birds were observed at their breeding site one each day and each hour during daytime (i.e., when the sun was above the horizon). example:
head(df)
ID site day hr nObs
1 19 A 202 11 60
2 19 A 202 13 18
3 19 A 202 15 27
4 8 B 188 8 6
5 8 B 188 9 6
6 8 B 188 11 7
However, this dataframe does not include hours when the bird was not observed. Eg. no line for bird 19 on day 202 at 14 with an nObs value of 0.
I'd like to find a way, preferably with dplyr (tidy verse), to add in those rows for when individuals were not observed.
Upvotes: 2
Views: 159
Reputation: 2737
One way to do this would be to first build a "template" of all possible combinations where birds can be observed and then merge ("left join") the actual observations onto that template:
a = read.table(text = " ID site day hr nObs
1 19 A 202 11 60
2 19 A 202 13 18
3 19 A 202 15 27
4 8 B 188 8 6
5 8 B 188 9 6
6 8 B 188 11 7")
tpl <- expand.grid(c(unique(a[, 1:3]), list(hr = 1:24)))
merge(tpl, a, all.x = TRUE)
Edit based on comment by @user3220999: in case we want to do the process per ID
, we can just use split
to get a list of data.frames per ID, get a list of templates and mapply
merge on the two lists:
a <- split(a, a$ID)
tpl <- lapply(a, function(ai) {
expand.grid(c(unique(ai[, 1:3]), list(hr = 1:24)))
})
res <- mapply(merge, tpl, a, SIMPLIFY = FALSE, MoreArgs = list(all.x = TRUE))
Upvotes: 1
Reputation: 51582
You can use complete
from tidyr
, i.e.
library(tidyverse)
df %>%
group_by(ID, site) %>%
complete(hr = seq(min(hr), max(hr)))
which gives,
# A tibble: 9 x 5 # Groups: ID, site [2] ID site hr day nObs <int> <fct> <int> <int> <int> 1 8 B 8 188 6 2 8 B 9 188 6 3 8 B 10 NA NA 4 8 B 11 188 7 5 19 A 11 202 60 6 19 A 12 NA NA 7 19 A 13 202 18 8 19 A 14 NA NA 9 19 A 15 202 27
Upvotes: 3