charmee
charmee

Reputation: 1591

Count number of rows by group using dplyr

I am using the mtcars dataset. I want to find the number of records for a particular combination of data. Something very similar to the count(*) group by clause in SQL. ddply() from plyr is working for me

library(plyr)
ddply(mtcars, .(cyl,gear),nrow)

has output

  cyl gear V1
1   4    3  1
2   4    4  8
3   4    5  2
4   6    3  2
5   6    4  4
6   6    5  1
7   8    3 12
8   8    5  2

Using this code

library(dplyr)
g <- group_by(mtcars, cyl, gear)
summarise(g, length(gear))

has output

  length(cyl)
1          32

I found various functions to pass in to summarise() but none seem to work for me. One function I found is sum(G), which returned

Error in eval(expr, envir, enclos) : object 'G' not found

Tried using n(), which returned

Error in n() : This function should not be called directly

What am I doing wrong? How can I get group_by() / summarise() to work for me?

Upvotes: 143

Views: 288288

Answers (6)

fsure
fsure

Reputation: 335

One could simply do this using dplyr:

library(dplyr)

# Use the function add_count() and name the new variable as "count"

mtcars %>%
  add_count(cyl, gear, name = "count")

This way you do not need to group the columns as the add_count() function does that for you when you mention variables. Also, this allows you to retain other variables in the data frame (if any).

Upvotes: 3

user3026255
user3026255

Reputation: 247

Another approach is to use the double colons as this will help avoid potential conflicts with functions with similar names from other packages.

mtcars %>% 
  dplyr::group_by(cyl, gear) %>%
  dplyr::summarise(length(gear))

Upvotes: 23

Quinten
Quinten

Reputation: 41603

Another option is using the function tally from dplyr. Here is a reproducible example:

library(dplyr)
mtcars %>% 
  group_by(cyl, gear) %>% 
  tally()
#> # A tibble: 8 × 3
#> # Groups:   cyl [3]
#>     cyl  gear     n
#>   <dbl> <dbl> <int>
#> 1     4     3     1
#> 2     4     4     8
#> 3     4     5     2
#> 4     6     3     2
#> 5     6     4     4
#> 6     6     5     1
#> 7     8     3    12
#> 8     8     5     2

Created on 2022-09-11 with reprex v2.0.2

Upvotes: 1

Matifou
Matifou

Reputation: 8950

Another option, not necesarily more elegant, but does not require to refer to a specific column:

mtcars %>% 
  group_by(cyl, gear) %>%
  do(data.frame(nrow=nrow(.)))

This is equivalent to using count():

library(dplyr, warn.conflicts = FALSE)
all.equal(mtcars %>% 
            group_by(cyl, gear) %>%
            do(data.frame(n=nrow(.))) %>% 
            ungroup(),
          count(mtcars, cyl, gear), check.attributes=FALSE)
#> [1] TRUE

Upvotes: 3

tb.
tb.

Reputation: 788

I think what you are looking for is as follows.

cars_by_cylinders_gears <- mtcars %>%
  group_by(cyl, gear) %>%
  summarise(count = n())

This is using the dplyr package. This is essentially the longhand version of the count () solution provided by docendo discimus.

Upvotes: 25

talat
talat

Reputation: 70336

There's a special function n() in dplyr to count rows (potentially within groups):

library(dplyr)
mtcars %>% 
  group_by(cyl, gear) %>% 
  summarise(n = n())
#Source: local data frame [8 x 3]
#Groups: cyl [?]
#
#    cyl  gear     n
#  (dbl) (dbl) (int)
#1     4     3     1
#2     4     4     8
#3     4     5     2
#4     6     3     2
#5     6     4     4
#6     6     5     1
#7     8     3    12
#8     8     5     2

But dplyr also offers a handy count function which does exactly the same with less typing:

count(mtcars, cyl, gear)          # or mtcars %>% count(cyl, gear)
#Source: local data frame [8 x 3]
#Groups: cyl [?]
#
#    cyl  gear     n
#  (dbl) (dbl) (int)
#1     4     3     1
#2     4     4     8
#3     4     5     2
#4     6     3     2
#5     6     4     4
#6     6     5     1
#7     8     3    12
#8     8     5     2

Upvotes: 195

Related Questions