Reputation: 149
I would like to summarize a table using dplyr. Here is how I would like to proceed:
year region week site species gps_clutch
2017 sud 18 6 au 337
2017 sud 20 10 au 352
2017 sud 22 10 au 352
2017 sud 24 10 au 352
2017 sud 18 6 aio 337
2017 sud 20 6 aio 352
2017 sud 22 6 au 352
2018 sud 20 6 au 337
2018 sud 20 10 au 352
2018 sud 22 10 au 352
2018 sud 22 10 aio 352
2018 sud 22 6 au 352
2017 nor 19 5 au 337
2017 nor 21 2 au 352
2017 nor 23 5 au 352
2017 nor 25 2 au 352
2017 nor 19 5 aio 337
2017 nor 25 5 aio 352
2017 nor 19 5 au 337
2018 nor 21 2 aio 352
2018 nor 23 5 aio 352
2018 nor 25 2 au 352
2018 nor 23 5 aio 337
2018 nor 23 5 au 352
this code works well to expand the grid as I would like:
dat %>%
group_by(region) %>%
expand(year,site, species,week)
the following code works too, to get the count values but does not expand the grid as I wish (I only get the list of weeks for which I did observe something for each year, not the total number of weeks sampled across both years). Which mean that if in "sud" "2017" I only have records for weeks 20 and 22, the grid will not get expanded to week 18 and 24 :
field_subsetnord %>%
group_by(year,region,site,species,week) %>%
summarise(count_clutch=length(gps_clutch)) %>%
complete(week,nesting(year,sites,species), fill = list(count_clutch = 0))
this is the table I would like to get at the end:
year region week site species count
2017 sud 18 6 au 1
2017 sud 20 6 au 0
2017 sud 22 6 au 1
2017 sud 24 6 au 0
2017 sud 18 6 aio 1
2017 sud 20 6 aio 1
2017 sud 22 6 aio 0
2017 sud 24 6 aio 0
2017 sud 18 10 au 0
2017 sud 20 10 au 1
2017 sud 22 10 au 1
2017 sud 24 10 au 1
2017 sud 18 10 aio 0
2017 sud 20 10 aio 0
2017 sud 22 10 aio 0
2017 sud 24 10 aio 0
2018 sud 18 6 au 0
2018 sud 20 6 au 1
2018 sud 22 6 au 1
2018 sud 24 6 au 0
2018 sud 18 6 aio 0
2018 sud 20 6 aio 0
2018 sud 22 6 aio 0
2018 sud 24 6 aio 0
2018 sud 18 10 au 0
2018 sud 20 10 au 1
2018 sud 22 10 au 1
2018 sud 24 10 au 0
2018 sud 18 10 aio 0
2018 sud 20 10 aio 0
2018 sud 22 10 aio 1
2018 sud 24 10 aio 0
and so on for 2018...
any suggestions to mix these two codes would be appreciated :)
Upvotes: 0
Views: 266
Reputation: 36086
You are so close with your two approaches. Essentially they just need to be combined to get what you're after. :)
Group by region and then complete()
the dataset first, then regroup by all variables and summarise()
. Since the gps_clutch
will now have missing values in it, you can sum up the non-missing values (via !is.na
) in the summarise()
statement to count the clutches.
dat %>%
group_by(region) %>%
complete(year, site, species, week) %>%
group_by(year, region, site, species, week) %>%
summarise(count_clutch = sum( !is.na(gps_clutch) ) )
# A tibble: 64 x 6
# Groups: year, region, site, species [16]
year region site species week count_clutch
<int> <fct> <int> <fct> <int> <int>
1 2017 nor 2 aio 19 0
2 2017 nor 2 aio 21 0
3 2017 nor 2 aio 23 0
4 2017 nor 2 aio 25 0
5 2017 nor 2 au 19 0
6 2017 nor 2 au 21 1
7 2017 nor 2 au 23 0
8 2017 nor 2 au 25 1
9 2017 nor 5 aio 19 1
10 2017 nor 5 aio 21 0
# ... with 54 more rows
Upvotes: 0