Reputation: 7526
My goal is simply to count the number of records in each hour of each day. I thought a simple solution could be found with the dplyr
or data.table
packages:
My data set is extremely simple:
> head(test)
id date hour
1 14869663 2018-01-24 17
2 14869664 2018-01-24 17
3 14869665 2018-01-24 17
4 14869666 2018-01-24 17
5 14869667 2018-01-24 17
6 14869668 2018-01-24 17
I only need to group by two variables (date and hour) and count. The id
doesn't matter. However, these two methods in dplyr
do not seem to produce the desired result (a data frame of the same length of the input data, which includes millions of records, is the output). What am I doing wrong here?
test %>% group_by(date, hour) %>% mutate(count = n())
test %>% add_count(date, hour)
The output would look something like this
> head(output)
n_records date hour
1 700 2018-01-24 0
2 750 2018-01-24 1
3 730 2018-01-24 2
4 700 2018-01-24 3
5 721 2018-01-24 4
6 753 2018-01-24 5
and so on
any suggestions?
Upvotes: 2
Views: 3564
Reputation: 68116
This seems to do the trick:
library(dplyr)
starwars %>%
group_by(gender, species) %>%
count
It appears (h/t to Frank) that the count function can take the grouping fields directly:
starwars %>% count(gender, species)
Upvotes: 7
Reputation: 13570
Base
aggregate(name ~ gender + species, data = starwars, length)
If we want to treat NAs as a group:
species1 <- factor(starwars$species, exclude = "")
gender1 <- factor(starwars$gender, exclude = "")
aggregate(name ~ gender1 + species1, data = starwars, length)
Upvotes: 0