Reputation: 19
I have several different datasets with different number of factor variables and an output variable. For each of these data-set I need to find number of rows of observations grouped by each factor level of the variables and further grouped by all variables (columns). I thought a for loop might do the trick but am struggling with it. Could someone please help with this?
the data set looks something like this:
and I want the ouput to be enter image description here
I have tried
for (i in 1:length(df)){
df %>% group_by(df[[i]]) %>% summarise(n = length(i))%>%print()
}
but this doesn't seem to be working
Upvotes: 0
Views: 1301
Reputation: 5138
If you are ok with a list format you could stop after creating the list. However, this is a (somewhat complex) alternative to the gather
method proposed by akrun:
# Getting a vector of factor variables in dataset
factor_vars <- names(factor_vars)[sapply(mtcars, is.factor)]
# Creating list of frequency tables
freq_tables <- lapply(factor_vars, function(x) group_by_(mtcars, .dots = x) %>% tally())
freq_tables <- lapply(freq_tables, function(x) cbind(colnames(x)[1], x))
do.call(rbind, lapply(freq_tables, setNames, c("Factor", "Level", "Count")))
Factor Level Count
1 vs 0 18
2 vs 1 14
3 am 0 19
4 am 1 13
5 gear 3 15
6 gear 4 12
7 gear 5 5
8 carb 1 7
9 carb 2 10
10 carb 3 3
11 carb 4 10
12 carb 6 1
13 carb 8 1
Data:
mtcars[8:11] <- lapply(mtcars[8:11], factor)
Upvotes: 1
Reputation: 302
You should be able to do something like
by(data$x, data$y, function)
where data$x
is what you want sorted, data$y
is what you sort for, and function
is what you want done to those entries (fx: mean, length, shapiro.test, etc). Then you can coerce this output to a vector using as.vector()
.
If I for instance have a dataframe with df <- dataframe(ID <- c(1, 1, 1, 1, 2, 2, 3), value <- (10, 20, 30, 40, 50, 60, 70))
then running as.vector(by(df$value, df$Id, lengh))
would return a vector (4, 2, 1)
Upvotes: 1
Reputation: 887301
An option is to gather
into 'long' format and then do the count
library(tidyverse)
gather(df1, Variable, Factor_Level, var1:var3) %>%
count(Variable, Factor_Level)
Upvotes: 3