Reputation: 1040
I can use the following the function "tabyl" form the janitor package like this to apply tabyl to every column.
lapply(mtcars[,2:4],tabyl)
What I really want to do is use group by cyl and then use tabyl to those all those specified columns,something like this (does not work):
lapply(mtcars[,2:4],tabyl(cyl))
How would I put this above line into an lapply function? Or is there some other way of grouping and using a group by logic?
Please note, I have hundreds of variables in my actual data, and I want to apply tabyl to almost all the variables in my data (all the numeric at least). So I need a way of calling tabyl on them without explicitly calling on the variable names!
I want it to look like this(provided in an answer below), except I want to include MANY more variables. Imagine mtcars has 104 variables, and I want to apply this group tabyl on only the numeric ones.
cyl
4 6 8
n Percent n Percent n Percent
disp 71.1 1 9.091 0 0.00 0 0.000
75.7 1 9.091 0 0.00 0 0.000
78.7 1 9.091 0 0.00 0 0.000
79 1 9.091 0 0.00 0 0.000
95.1 1 9.091 0 0.00 0 0.000
108 1 9.091 0 0.00 0 0.000
120.1 1 9.091 0 0.00 0 0.000
120.3 1 9.091 0 0.00 0 0.000
121 1 9.091 0 0.00 0 0.000
140.8 1 9.091 0 0.00 0 0.000
145 0 0.000 1 14.29 0 0.000
146.7 1 9.091 0 0.00 0 0.000
160 0 0.000 2 28.57 0 0.000
167.6 0 0.000 2 28.57 0 0.000
225 0 0.000 1 14.29 0 0.000
258 0 0.000 1 14.29 0 0.000
275.8 0 0.000 0 0.00 3 21.429
301 0 0.000 0 0.00 1 7.143
304 0 0.000 0 0.00 1 7.143
318 0 0.000 0 0.00 1 7.143
350 0 0.000 0 0.00 1 7.143
351 0 0.000 0 0.00 1 7.143
360 0 0.000 0 0.00 2 14.286
400 0 0.000 0 0.00 1 7.143
440 0 0.000 0 0.00 1 7.143
460 0 0.000 0 0.00 1 7.143
472 0 0.000 0 0.00 1 7.143
All 11 100.000 7 100.00 14 100.000
hp 52 1 9.091 0 0.00 0 0.000
62 1 9.091 0 0.00 0 0.000
65 1 9.091 0 0.00 0 0.000
66 2 18.182 0 0.00 0 0.000
91 1 9.091 0 0.00 0 0.000
93 1 9.091 0 0.00 0 0.000
95 1 9.091 0 0.00 0 0.000
97 1 9.091 0 0.00 0 0.000
105 0 0.000 1 14.29 0 0.000
109 1 9.091 0 0.00 0 0.000
110 0 0.000 3 42.86 0 0.000
113 1 9.091 0 0.00 0 0.000
123 0 0.000 2 28.57 0 0.000
150 0 0.000 0 0.00 2 14.286
175 0 0.000 1 14.29 2 14.286
180 0 0.000 0 0.00 3 21.429
205 0 0.000 0 0.00 1 7.143
215 0 0.000 0 0.00 1 7.143
230 0 0.000 0 0.00 1 7.143
245 0 0.000 0 0.00 2 14.286
264 0 0.000 0 0.00 1 7.143
335 0 0.000 0 0.00 1 7.143
All 11 100.000 7 100.00 14 100.000
Upvotes: 1
Views: 970
Reputation: 10855
There are lots of ways to generate counts and frequencies by multiple variables. A solution with tables::tabular()
enables one to display the "by group" on the column dimension, and other variables on the row dimension of a table.
We'll use the mtcars
data to display disp
and hp
on the row dimension, and cyl
on the column dimension.
library(tables)
tabular(((Factor(disp) + 1) + (Factor(hp) + 1))~(Factor(cyl))*((n=1) + Percent("col")),data = mtcars)
...and the output:
cyl
4 6 8
n Percent n Percent n Percent
disp 71.1 1 9.091 0 0.00 0 0.000
75.7 1 9.091 0 0.00 0 0.000
78.7 1 9.091 0 0.00 0 0.000
79 1 9.091 0 0.00 0 0.000
95.1 1 9.091 0 0.00 0 0.000
108 1 9.091 0 0.00 0 0.000
120.1 1 9.091 0 0.00 0 0.000
120.3 1 9.091 0 0.00 0 0.000
121 1 9.091 0 0.00 0 0.000
140.8 1 9.091 0 0.00 0 0.000
145 0 0.000 1 14.29 0 0.000
146.7 1 9.091 0 0.00 0 0.000
160 0 0.000 2 28.57 0 0.000
167.6 0 0.000 2 28.57 0 0.000
225 0 0.000 1 14.29 0 0.000
258 0 0.000 1 14.29 0 0.000
275.8 0 0.000 0 0.00 3 21.429
301 0 0.000 0 0.00 1 7.143
304 0 0.000 0 0.00 1 7.143
318 0 0.000 0 0.00 1 7.143
350 0 0.000 0 0.00 1 7.143
351 0 0.000 0 0.00 1 7.143
360 0 0.000 0 0.00 2 14.286
400 0 0.000 0 0.00 1 7.143
440 0 0.000 0 0.00 1 7.143
460 0 0.000 0 0.00 1 7.143
472 0 0.000 0 0.00 1 7.143
All 11 100.000 7 100.00 14 100.000
hp 52 1 9.091 0 0.00 0 0.000
62 1 9.091 0 0.00 0 0.000
65 1 9.091 0 0.00 0 0.000
66 2 18.182 0 0.00 0 0.000
91 1 9.091 0 0.00 0 0.000
93 1 9.091 0 0.00 0 0.000
95 1 9.091 0 0.00 0 0.000
97 1 9.091 0 0.00 0 0.000
105 0 0.000 1 14.29 0 0.000
109 1 9.091 0 0.00 0 0.000
110 0 0.000 3 42.86 0 0.000
113 1 9.091 0 0.00 0 0.000
123 0 0.000 2 28.57 0 0.000
150 0 0.000 0 0.00 2 14.286
175 0 0.000 1 14.29 2 14.286
180 0 0.000 0 0.00 3 21.429
205 0 0.000 0 0.00 1 7.143
215 0 0.000 0 0.00 1 7.143
230 0 0.000 0 0.00 1 7.143
245 0 0.000 0 0.00 2 14.286
264 0 0.000 0 0.00 1 7.143
335 0 0.000 0 0.00 1 7.143
All 11 100.000 7 100.00 14 100.000
>
In the comments to my answer, the original poster asked how one might automate tabular()
to avoid having to type out all the variables to be tabulated. We can do this with lapply()
and an anonymous function.
Since the OP used column numbers as part of their question, we'll create a vector of columns from the mtcars
data frame to be tabulated. We'll use that as the input to lapply()
, along with two other arguments, one for the data frame, and another to specify the column variable in the table. Since the column variable will be a single variable, we specified it with its column name rather than a number.
# generalize and automate
varList <- 2:4
lapply(varList,function(x,df,byVar){
tabular((Factor(df[[x]],paste(colnames(df)[x])) + 1) ~ ((Factor(df[[byVar]],paste(byVar)))*((n=1) + Percent("col"))),
data= df)
},mtcars,"cyl")
The tricky part is how automating the process without the output tables having row headers of df[[x]]
and column headers of df[[byVar]]
. To avoid this situation, we extract the column name for the row dimension with colnames()
, and we overwrite the header for the columns by pasting the byVar
argument into the header.
...and the output:
[[1]]
cyl
4 6 8
cyl n Percent n Percent n Percent
4 11 100 0 0 0 0
6 0 0 7 100 0 0
8 0 0 0 0 14 100
All 11 100 7 100 14 100
[[2]]
cyl
4 6 8
disp n Percent n Percent n Percent
71.1 1 9.091 0 0.00 0 0.000
75.7 1 9.091 0 0.00 0 0.000
78.7 1 9.091 0 0.00 0 0.000
79 1 9.091 0 0.00 0 0.000
95.1 1 9.091 0 0.00 0 0.000
108 1 9.091 0 0.00 0 0.000
120.1 1 9.091 0 0.00 0 0.000
120.3 1 9.091 0 0.00 0 0.000
121 1 9.091 0 0.00 0 0.000
140.8 1 9.091 0 0.00 0 0.000
145 0 0.000 1 14.29 0 0.000
146.7 1 9.091 0 0.00 0 0.000
160 0 0.000 2 28.57 0 0.000
167.6 0 0.000 2 28.57 0 0.000
225 0 0.000 1 14.29 0 0.000
258 0 0.000 1 14.29 0 0.000
275.8 0 0.000 0 0.00 3 21.429
301 0 0.000 0 0.00 1 7.143
304 0 0.000 0 0.00 1 7.143
318 0 0.000 0 0.00 1 7.143
350 0 0.000 0 0.00 1 7.143
351 0 0.000 0 0.00 1 7.143
360 0 0.000 0 0.00 2 14.286
400 0 0.000 0 0.00 1 7.143
440 0 0.000 0 0.00 1 7.143
460 0 0.000 0 0.00 1 7.143
472 0 0.000 0 0.00 1 7.143
All 11 100.000 7 100.00 14 100.000
[[3]]
cyl
4 6 8
hp n Percent n Percent n Percent
52 1 9.091 0 0.00 0 0.000
62 1 9.091 0 0.00 0 0.000
65 1 9.091 0 0.00 0 0.000
66 2 18.182 0 0.00 0 0.000
91 1 9.091 0 0.00 0 0.000
93 1 9.091 0 0.00 0 0.000
95 1 9.091 0 0.00 0 0.000
97 1 9.091 0 0.00 0 0.000
105 0 0.000 1 14.29 0 0.000
109 1 9.091 0 0.00 0 0.000
110 0 0.000 3 42.86 0 0.000
113 1 9.091 0 0.00 0 0.000
123 0 0.000 2 28.57 0 0.000
150 0 0.000 0 0.00 2 14.286
175 0 0.000 1 14.29 2 14.286
180 0 0.000 0 0.00 3 21.429
205 0 0.000 0 0.00 1 7.143
215 0 0.000 0 0.00 1 7.143
230 0 0.000 0 0.00 1 7.143
245 0 0.000 0 0.00 2 14.286
264 0 0.000 0 0.00 1 7.143
335 0 0.000 0 0.00 1 7.143
All 11 100.000 7 100.00 14 100.000
Upvotes: 1
Reputation: 46908
One way is this, although I don't know if you need the cyl column:
by(mtcars[,2:4],mtcars$cyl,lapply,tabyl)
Or a tidy way, (I think the list part can be improved) :
out = mtcars[,2:4] %>%
mutate(id=cyl) %>%
group_by(id) %>% summarize_all(~list(tabyl(.)))
out
# A tibble: 3 x 4
id cyl disp hp
<dbl> <list> <list> <list>
1 4 <df[,3] [1 × 3]> <df[,3] [11 × 3]> <df[,3] [10 × 3]>
2 6 <df[,3] [1 × 3]> <df[,3] [5 × 3]> <df[,3] [4 × 3]>
3 8 <df[,3] [1 × 3]> <df[,3] [11 × 3]> <df[,3] [9 × 3]>
out %>% filter(id==4) %>% pull(hp)
[[1]]
. n percent
52 1 0.09090909
62 1 0.09090909
65 1 0.09090909
66 2 0.18181818
91 1 0.09090909
93 1 0.09090909
95 1 0.09090909
97 1 0.09090909
109 1 0.09090909
113 1 0.09090909
Upvotes: 1