Reputation: 97
I have a table with over 200 categorical variables. Sample:
Cat1 Cat2 Cat3
A B A
B A A
A C A
A B A
I want to get frequencies (number of times) any category appeared in the dataset. Something like this:
I very new to R and tried using a for loop to get the result. I am sure that there are better ways to do so. Can you please help me with this?
Upvotes: 2
Views: 834
Reputation: 11893
In general, the most convenient function to count how many tokens you have of each type is ?table:
d <- read.table(text="Cat1 Cat2 Cat3
A B A
B A A
A C A
A B A", header=T)
table(d$Cat1)
# A B
# 3 1
The most convenient way to execute table()
for every categorical variable in a dataset is to use ?summary.data.frame:
summary(d)
# Cat1 Cat2 Cat3
# A:3 A:1 A:4
# B:1 B:2
# C:1
On the other hand, if you want to get a table that collapses over all categorical variables, you can use table()
with ?unlist:
table(unlist(d))
# A B C
# 8 3 1
To understand what's happening there, the thing to realize is that in R
a data frame is a special kind of list: each variable is a vector and the data frame is a list of vectors of equal length (cf., here). The unlist()
function turns those into one long vector concatenated from first to last. Note that if you have some non-categorical variables mixed in, you will need to exclude those with something like table(unlist(d[,c(<variables to use>)]))
.
Upvotes: 1