Reputation: 1327
I am trying to use dplyr
to generate a new column in a data frame, based on the aggregation of values in existing columns. Given my dataframe:
group1 <- c("2019","2019","2019","2018","2018","2017","2017","2017")
group2 <- c("2019-01-01", "2019-01-01","2019-01-01","2018-05-01","2018-06-01","2017-01-01","2017-01-01","2017-02-01")
group3 <- c("A","A","B","A","A","C","C","B")
df <- data.frame("Year" = group1,"Date" = group2,"Sample" = group3)
Gives:
Year Date Sample
1 2019 2019-01-01 A
2 2019 2019-01-01 A
3 2019 2019-01-01 B
4 2018 2018-05-01 A
5 2018 2018-06-01 A
6 2017 2017-01-01 C
7 2017 2017-01-01 C
8 2017 2017-02-01 B
So I'd like to generate new column "Count", that for each row gives the total number of unique dates per sample. So for the above data, I would expect the results to be:
Year Date Sample Count
1 2019 2019-01-01 A 1
2 2019 2019-01-01 A 1
3 2019 2019-02-01 B 1
4 2018 2018-05-01 A 2
5 2018 2018-06-01 C 2
6 2017 2017-01-01 C 1
7 2017 2017-01-01 C 1
8 2017 2017-02-01 B 1
I've tried using the following code in r
:
df %>%
group_by(Year) %>%
group_by(Sample) %>%
group_by(Date) %>%
mutate(Count = n_distinct(Date))
But I'm not getting the correct answer!
Upvotes: 1
Views: 22
Reputation: 14764
You could try:
library(dplyr)
df %>%
group_by(Year, Sample) %>%
mutate(Count = n_distinct(Date))
If you want to pass several variables to group_by
, you need to put them together - what you were doing is cancelling out the previous groupings by each new statement.
Moreover, if you'd like to count unique dates, you shouldn't group by them.
The above code would give:
# A tibble: 8 x 4
# Groups: Year, Sample [6]
Year Date Sample Count
<fct> <fct> <fct> <int>
1 2019 2019-01-01 A 1
2 2019 2019-01-01 A 1
3 2019 2019-01-01 B 1
4 2018 2018-05-01 A 2
5 2018 2018-06-01 A 2
6 2018 2017-01-01 C 1
7 2017 2017-01-01 C 1
8 2017 2017-02-01 B 1
Note that there is a mismatch between your generated data frame and the one you show us. The data frame generated by your code is:
Year Date Sample
1 2019 2019-01-01 A
2 2019 2019-01-01 A
3 2019 2019-01-01 B
4 2018 2018-05-01 A
5 2018 2018-06-01 A
6 2018 2017-01-01 C
7 2017 2017-01-01 C
8 2017 2017-02-01 B
Where indeed the only Sample
with 2 distinct Date
s in a given Year
is A
(in 2018).
Upvotes: 2