AnT
AnT

Reputation: 19

How can I find count of rows by each factor level of each column in a dataframe in R?

I have several different datasets with different number of factor variables and an output variable. For each of these data-set I need to find number of rows of observations grouped by each factor level of the variables and further grouped by all variables (columns). I thought a for loop might do the trick but am struggling with it. Could someone please help with this?

the data set looks something like this:

enter image description here

and I want the ouput to be enter image description here

I have tried

for (i in 1:length(df)){
df %>% group_by(df[[i]]) %>%  summarise(n = length(i))%>%print()

}

but this doesn't seem to be working

Upvotes: 0

Views: 1301

Answers (3)

Andrew
Andrew

Reputation: 5138

If you are ok with a list format you could stop after creating the list. However, this is a (somewhat complex) alternative to the gather method proposed by akrun:

# Getting a vector of factor variables in dataset
factor_vars <- names(factor_vars)[sapply(mtcars, is.factor)]

# Creating list of frequency tables
freq_tables <- lapply(factor_vars, function(x) group_by_(mtcars, .dots = x) %>% tally())

freq_tables <- lapply(freq_tables, function(x) cbind(colnames(x)[1], x))
do.call(rbind, lapply(freq_tables, setNames, c("Factor", "Level", "Count")))

   Factor Level Count
1      vs     0    18
2      vs     1    14
3      am     0    19
4      am     1    13
5    gear     3    15
6    gear     4    12
7    gear     5     5
8    carb     1     7
9    carb     2    10
10   carb     3     3
11   carb     4    10
12   carb     6     1
13   carb     8     1

Data:

mtcars[8:11] <- lapply(mtcars[8:11], factor)

Upvotes: 1

Christoffer Sannes
Christoffer Sannes

Reputation: 302

You should be able to do something like

by(data$x, data$y, function)

where data$x is what you want sorted, data$y is what you sort for, and function is what you want done to those entries (fx: mean, length, shapiro.test, etc). Then you can coerce this output to a vector using as.vector().

If I for instance have a dataframe with df <- dataframe(ID <- c(1, 1, 1, 1, 2, 2, 3), value <- (10, 20, 30, 40, 50, 60, 70)) then running as.vector(by(df$value, df$Id, lengh)) would return a vector (4, 2, 1)

Upvotes: 1

akrun
akrun

Reputation: 887301

An option is to gather into 'long' format and then do the count

library(tidyverse)
gather(df1, Variable,  Factor_Level, var1:var3) %>%
     count(Variable, Factor_Level)

Upvotes: 3

Related Questions