iskandarblue
iskandarblue

Reputation: 7526

Counting and grouping with dplyr

My goal is simply to count the number of records in each hour of each day. I thought a simple solution could be found with the dplyr or data.table packages:

My data set is extremely simple:

> head(test)
        id       date hour
1 14869663 2018-01-24   17
2 14869664 2018-01-24   17
3 14869665 2018-01-24   17
4 14869666 2018-01-24   17
5 14869667 2018-01-24   17
6 14869668 2018-01-24   17

I only need to group by two variables (date and hour) and count. The id doesn't matter. However, these two methods in dplyr do not seem to produce the desired result (a data frame of the same length of the input data, which includes millions of records, is the output). What am I doing wrong here?

test %>% group_by(date, hour) %>% mutate(count = n())
test %>% add_count(date, hour)

The output would look something like this

> head(output)
n_records       date hour
1 700      2018-01-24   0
2 750      2018-01-24   1
3 730      2018-01-24   2
4 700      2018-01-24   3
5 721      2018-01-24   4
6 753      2018-01-24   5

and so on

any suggestions?

Upvotes: 2

Views: 3564

Answers (3)

Paul H
Paul H

Reputation: 68116

This seems to do the trick:

library(dplyr)
starwars %>% 
    group_by(gender, species) %>%
    count

It appears (h/t to Frank) that the count function can take the grouping fields directly:

starwars %>% count(gender, species)

Upvotes: 7

mpalanco
mpalanco

Reputation: 13570

Base

aggregate(name ~ gender + species, data = starwars, length)

If we want to treat NAs as a group:

species1 <- factor(starwars$species, exclude = "")
gender1 <- factor(starwars$gender, exclude = "")
aggregate(name ~ gender1 + species1, data = starwars, length)

Upvotes: 0

chinsoon12
chinsoon12

Reputation: 25225

using data.table,

test[, .N, by=.(date, hour)]

Upvotes: 1

Related Questions