David Rogers
David Rogers

Reputation: 141

Count unique categorical values in R

I have just started using R not long ago, as a replacement for Excel. The databases I deal with are very large and I needed a new tool to work better with them. I've managed to find a lot of answers on this website about R, which have helped me build my scripts, but was unable to find anything conclusive for one particular type of analysis.

My data looks like this:

Col1   Col2     Col3    ... Col50  

 M    18-24     Single  ... Employed

 F    18-24     Married ... Unemployed

 F    Under 18  Single  ... Employed

Usually the databases I deal with have got even 100,000 rows and 30 - 70 columns and usually there are not more than 20 unique values per column

What I want is an subset output that will give me the frequency counts for each unique value in each column :

Col1

Variable name / F / M

Frequency / 2 / 1

.....

Col50

Variable name / Employed / Un-employed

Frequency /  2 / 1

Can anybody at least give me a hint of what I should be looking for to count those categorical values. Do I need a special package or something? I was able to find some functions that count values, but they only refer to numerical values (like the "table()function".

David Rogers

Upvotes: 2

Views: 7831

Answers (2)

if you use summary(mydata) it should give you output with the number of times each unique value occurs in each column.

if you use count(mydata$column.name) it will give you the unique values in that column and the frequencies.

you should be able to simply use a tapply across all the columns to get what you want.

Upvotes: 0

Dason
Dason

Reputation: 61983

table sounds like what you want. It will give you the number of occurrences of each value. To easily apply table to each column we can just use lapply

lapply(your_data, table)
# Example use and output
lapply(mtcars, table)

Upvotes: 5

Related Questions