Reputation:
I've tried to look this up on SO
, but can't find any simple answers. My question is extremely simple.
Have a look at the mpg
dataset in ggplot2
package. I want to first group by model
and manufacturer
, then count the frequency of the character strings in the "drv"
column ("f", "r", "4")
.
I've tried:
mpg %>%
group_by(model, manufacturer) %>%
summarise(sum_drv = sum(drv))
Which doesn't work because the drv column contains character values.
Thanks,
Upvotes: 0
Views: 3738
Reputation: 16121
You didn't provide an ideal output so, you can have two different formats.
All drv
values in a column:
library(tidyverse)
mpg %>% count(model, manufacturer, drv)
# # A tibble: 38 x 4
# model manufacturer drv n
# <chr> <chr> <chr> <int>
# 1 4runner 4wd toyota 4 6
# 2 a4 audi f 7
# 3 a4 quattro audi 4 8
# 4 a6 quattro audi 4 3
# 5 altima nissan f 6
# 6 c1500 suburban 2wd chevrolet r 5
# 7 camry toyota f 7
# 8 camry solara toyota f 7
# 9 caravan 2wd dodge f 11
#10 civic honda f 9
# # ... with 28 more rows
Each drv
value as a column:
mpg %>% count(model, manufacturer, drv) %>% spread(drv, n, fill=0)
# # A tibble: 38 x 5
# model manufacturer `4` f r
# * <chr> <chr> <dbl> <dbl> <dbl>
# 1 4runner 4wd toyota 6.00 0 0
# 2 a4 audi 0 7.00 0
# 3 a4 quattro audi 8.00 0 0
# 4 a6 quattro audi 3.00 0 0
# 5 altima nissan 0 6.00 0
# 6 c1500 suburban 2wd chevrolet 0 0 5.00
# 7 camry toyota 0 7.00 0
# 8 camry solara toyota 0 7.00 0
# 9 caravan 2wd dodge 0 11.0 0
#10 civic honda 0 9.00 0
# # ... with 28 more rows
Upvotes: 2
Reputation: 37879
I think table
is what you are looking for:
mpg %>%
group_by(model, manufacturer, drv) %>%
summarise(sum_drv = as.numeric(table(drv)))
# # A tibble: 38 x 4
# # Groups: model, manufacturer [?]
# model manufacturer drv sum_drv
# <chr> <chr> <chr> <dbl>
# 1 4runner 4wd toyota 4 6
# 2 a4 audi f 7
# 3 a4 quattro audi 4 8
# 4 a6 quattro audi 4 3
# 5 altima nissan f 6
# 6 c1500 suburban 2wd chevrolet r 5
# 7 camry toyota f 7
# 8 camry solara toyota f 7
# 9 caravan 2wd dodge f 11
# 10 civic honda f 9
# # ... with 28 more rows
Upvotes: 1
Reputation: 20095
You can make a simple change to get the count/frequency as below. Include drv
as part by group_by
and then you have to just count per group.
mpg %>%
group_by(model, manufacturer, drv) %>%
summarise(sum_drv = n())
Results:
# A tibble: 38 x 4
# Groups: model, manufacturer [?]
model manufacturer drv sum_drv
<chr> <chr> <chr> <int>
1 4runner 4wd toyota 4 6
2 a4 audi f 7
3 a4 quattro audi 4 8
4 a6 quattro audi 4 3
5 altima nissan f 6
6 c1500 suburban 2wd chevrolet r 5
7 camry toyota f 7
8 camry solara toyota f 7
9 caravan 2wd dodge f 11
10 civic honda f 9
Upvotes: 1