Reputation: 1591
I am using the mtcars
dataset. I want to find the number of records for a particular combination of data. Something very similar to the count(*)
group by clause in SQL. ddply()
from plyr is working for me
library(plyr)
ddply(mtcars, .(cyl,gear),nrow)
has output
cyl gear V1
1 4 3 1
2 4 4 8
3 4 5 2
4 6 3 2
5 6 4 4
6 6 5 1
7 8 3 12
8 8 5 2
Using this code
library(dplyr)
g <- group_by(mtcars, cyl, gear)
summarise(g, length(gear))
has output
length(cyl)
1 32
I found various functions to pass in to summarise()
but none seem to work for me. One function I found is sum(G)
, which returned
Error in eval(expr, envir, enclos) : object 'G' not found
Tried using n()
, which returned
Error in n() : This function should not be called directly
What am I doing wrong? How can I get group_by()
/ summarise()
to work for me?
Upvotes: 143
Views: 288288
Reputation: 335
One could simply do this using dplyr
:
library(dplyr)
# Use the function add_count() and name the new variable as "count"
mtcars %>%
add_count(cyl, gear, name = "count")
This way you do not need to group the columns as the add_count()
function does that for you when you mention variables. Also, this allows you to retain other variables in the data frame (if any).
Upvotes: 3
Reputation: 247
Another approach is to use the double colons as this will help avoid potential conflicts with functions with similar names from other packages.
mtcars %>%
dplyr::group_by(cyl, gear) %>%
dplyr::summarise(length(gear))
Upvotes: 23
Reputation: 41603
Another option is using the function tally from dplyr
. Here is a reproducible example:
library(dplyr)
mtcars %>%
group_by(cyl, gear) %>%
tally()
#> # A tibble: 8 × 3
#> # Groups: cyl [3]
#> cyl gear n
#> <dbl> <dbl> <int>
#> 1 4 3 1
#> 2 4 4 8
#> 3 4 5 2
#> 4 6 3 2
#> 5 6 4 4
#> 6 6 5 1
#> 7 8 3 12
#> 8 8 5 2
Created on 2022-09-11 with reprex v2.0.2
Upvotes: 1
Reputation: 8950
Another option, not necesarily more elegant, but does not require to refer to a specific column:
mtcars %>%
group_by(cyl, gear) %>%
do(data.frame(nrow=nrow(.)))
This is equivalent to using count()
:
library(dplyr, warn.conflicts = FALSE)
all.equal(mtcars %>%
group_by(cyl, gear) %>%
do(data.frame(n=nrow(.))) %>%
ungroup(),
count(mtcars, cyl, gear), check.attributes=FALSE)
#> [1] TRUE
Upvotes: 3
Reputation: 788
I think what you are looking for is as follows.
cars_by_cylinders_gears <- mtcars %>%
group_by(cyl, gear) %>%
summarise(count = n())
This is using the dplyr package. This is essentially the longhand version of the count () solution provided by docendo discimus.
Upvotes: 25
Reputation: 70336
There's a special function n()
in dplyr to count rows (potentially within groups):
library(dplyr)
mtcars %>%
group_by(cyl, gear) %>%
summarise(n = n())
#Source: local data frame [8 x 3]
#Groups: cyl [?]
#
# cyl gear n
# (dbl) (dbl) (int)
#1 4 3 1
#2 4 4 8
#3 4 5 2
#4 6 3 2
#5 6 4 4
#6 6 5 1
#7 8 3 12
#8 8 5 2
But dplyr also offers a handy count
function which does exactly the same with less typing:
count(mtcars, cyl, gear) # or mtcars %>% count(cyl, gear)
#Source: local data frame [8 x 3]
#Groups: cyl [?]
#
# cyl gear n
# (dbl) (dbl) (int)
#1 4 3 1
#2 4 4 8
#3 4 5 2
#4 6 3 2
#5 6 4 4
#6 6 5 1
#7 8 3 12
#8 8 5 2
Upvotes: 195