Reputation: 83
I've got a little problem using dplyr group_by
function.
After doing this :
datasetALL %>% group_by(YEAR,Region) %>% summarise(count_number = n())
here is the result :
YEAR Region count_number
<int> <int> <int>
1 1946 1 2
2 1946 2 3
3 1946 3 1
4 1946 5 1
5 1947 3 1
6 1947 4 1
I would like something like :
YEAR Region count_number
<int> <int> <int>
1 1946 1 2
2 1946 2 3
3 1946 3 1
4 1946 5 1
5 1946 4 0 #order is not important
6 1947 1 0
7 1947 2 0
8 1947 3 1
9 1947 4 1
10 1947 5 0
I tried to use complete()
from tidyr package, but it's not succeeding...
Upvotes: 8
Views: 11221
Reputation: 39858
It has been already mentioned, but you can solve this problem in its entirety by using tidyr
and the parameter nesting
in it:
complete(df, YEAR, nesting(Region), fill = list(count_number = 0))
YEAR Region count_number
<int> <int> <dbl>
1 1946 1 2
2 1946 2 3
3 1946 3 1
4 1946 4 0
5 1946 5 1
6 1947 1 0
7 1947 2 0
8 1947 3 1
9 1947 4 1
10 1947 5 0
Upvotes: 2
Reputation: 3447
Using complete
from the tidyr package should work. You can find documentation about it here.
What probably happened is that you did not remove the grouping. Then complete tries to add each of the combinations of YEAR
and Region
within each group. But all these combinations are already in the grouping. Thus first remove the grouping and then do the complete.
datasetALL %>%
group_by(YEAR,Region) %>%
summarise(count_number = n()) %>%
ungroup() %>%
complete(Year, Region, fill = list(count_number = 1))
Upvotes: 20