Reputation: 1068
In a data frame like data
below:
library(tidyverse)
ID <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y","Z", "a","b","c","d")
State <- rep(c("FL", "GA", "SC", "NC", "VA", "GA"), each = 5)
Location <- rep(c("alpha", "beta", "gamma"), each = 10)
Var3 <- rep(c("Bravo", "Charlie", "Delta", "Echo"), times = c(7,8,10,5))
Sex <- rep(c("M","F","M"), times = 10)
data <- data.frame(ID, State, Location, Var3, Sex)
I want to return a data frame, or a list of several data frames, that summarize each way the data can be grouped. I want to see how many individual ID
s are in each State
, Location
, and Var3
, how many M
and F
are in each State
, Location
, and Var3
, how many Locations
are in each State
, ect... what is the best way to achieve this.
Upvotes: 2
Views: 666
Reputation: 887128
We can use count
library(dplyr)
data %>%
count(State, Location, Var3, Sex)
Also, to get rollup/cube
way of hierarchial counts,
library(data.table)
rollup(as.data.table(data), j = .N, by = c("State","Location","Var3", "Sex"))
# State Location Var3 Sex N
# 1: FL alpha Bravo M 3
# 2: FL alpha Bravo F 2
# 3: GA alpha Bravo M 2
# 4: GA alpha Charlie F 1
# 5: GA alpha Charlie M 2
# 6: SC beta Charlie F 2
# 7: SC beta Charlie M 3
# 8: NC beta Delta M 3
# 9: NC beta Delta F 2
#10: VA gamma Delta M 4
#11: VA gamma Delta F 1
#12: GA gamma Echo F 2
#13: GA gamma Echo M 3
#14: FL alpha Bravo <NA> 5
#15: GA alpha Bravo <NA> 2
#16: GA alpha Charlie <NA> 3
#17: SC beta Charlie <NA> 5
#18: NC beta Delta <NA> 5
#19: VA gamma Delta <NA> 5
#20: GA gamma Echo <NA> 5
#21: FL alpha <NA> <NA> 5
#22: GA alpha <NA> <NA> 5
#23: SC beta <NA> <NA> 5
#24: NC beta <NA> <NA> 5
#25: VA gamma <NA> <NA> 5
#26: GA gamma <NA> <NA> 5
#27: FL <NA> <NA> <NA> 5
#28: GA <NA> <NA> <NA> 10
#29: SC <NA> <NA> <NA> 5
#30: NC <NA> <NA> <NA> 5
#31: VA <NA> <NA> <NA> 5
#32: <NA> <NA> <NA> <NA> 30
# State Location Var3 Sex N
Or use cube
cube(as.data.table(data), j = .N, by = c("State","Location","Var3", "Sex"))
#. State Location Var3 Sex N
# 1: FL alpha Bravo M 3
# 2: FL alpha Bravo F 2
# 3: GA alpha Bravo M 2
# 4: GA alpha Charlie F 1
# 5: GA alpha Charlie M 2
# ---
#111: <NA> <NA> Delta <NA> 10
#112: <NA> <NA> Echo <NA> 5
#113: <NA> <NA> <NA> M 20
#114: <NA> <NA> <NA> F 10
#115: <NA> <NA> <NA> <NA> 30
Upvotes: 2
Reputation: 39858
One dplyr
and purrr
solution to group by all possible combinations of column names could be:
map2(list(colnames(data)),
1:ncol(data),
combn, simplify = FALSE) %>%
flatten() %>%
map(~ data %>%
group_by_at(.x) %>%
tally())
In this case, there are 31 possible combinations of column names, so it returns 31 lists. The first three lists:
[[1]]
# A tibble: 30 x 2
ID n
<fct> <int>
1 a 1
2 A 1
3 b 1
4 B 1
5 c 1
6 C 1
7 d 1
8 D 1
9 E 1
10 F 1
# … with 20 more rows
[[2]]
# A tibble: 5 x 2
State n
<fct> <int>
1 FL 5
2 GA 10
3 NC 5
4 SC 5
5 VA 5
[[3]]
# A tibble: 3 x 2
Location n
<fct> <int>
1 alpha 10
2 beta 10
3 gamma 10
Upvotes: 1