akhil sood
akhil sood

Reputation: 97

Frequency cross-tabulation in R for categorical variables

I have a table with over 200 categorical variables. Sample:

Cat1 Cat2 Cat3
A B A
B A A
A C A
A B A

I want to get frequencies (number of times) any category appeared in the dataset. Something like this:

I very new to R and tried using a for loop to get the result. I am sure that there are better ways to do so. Can you please help me with this?

Upvotes: 2

Views: 834

Answers (1)

gung - Reinstate Monica
gung - Reinstate Monica

Reputation: 11893

In general, the most convenient function to count how many tokens you have of each type is ?table:

d <- read.table(text="Cat1 Cat2 Cat3
A B A
B A A
A C A
A B A", header=T)
table(d$Cat1)
# A B 
# 3 1 

The most convenient way to execute table() for every categorical variable in a dataset is to use ?summary.data.frame:

summary(d)
#  Cat1  Cat2  Cat3 
#  A:3   A:1   A:4  
#  B:1   B:2        
#        C:1        

On the other hand, if you want to get a table that collapses over all categorical variables, you can use table() with ?unlist:

table(unlist(d))
# A B C 
# 8 3 1 

To understand what's happening there, the thing to realize is that in R a data frame is a special kind of list: each variable is a vector and the data frame is a list of vectors of equal length (cf., here). The unlist() function turns those into one long vector concatenated from first to last. Note that if you have some non-categorical variables mixed in, you will need to exclude those with something like table(unlist(d[,c(<variables to use>)])).

Upvotes: 1

Related Questions