Reputation: 141
I have just started using R not long ago, as a replacement for Excel. The databases I deal with are very large and I needed a new tool to work better with them. I've managed to find a lot of answers on this website about R, which have helped me build my scripts, but was unable to find anything conclusive for one particular type of analysis.
My data looks like this:
Col1 Col2 Col3 ... Col50
M 18-24 Single ... Employed
F 18-24 Married ... Unemployed
F Under 18 Single ... Employed
Usually the databases I deal with have got even 100,000 rows and 30 - 70 columns and usually there are not more than 20 unique values per column
What I want is an subset output that will give me the frequency counts for each unique value in each column :
Col1
Variable name / F / M
Frequency / 2 / 1
.....
Col50
Variable name / Employed / Un-employed
Frequency / 2 / 1
Can anybody at least give me a hint of what I should be looking for to count those categorical values. Do I need a special package or something? I was able to find some functions that count values, but they only refer to numerical values (like the "table()function"
.
David Rogers
Upvotes: 2
Views: 7831
Reputation: 1
if you use summary(mydata) it should give you output with the number of times each unique value occurs in each column.
if you use count(mydata$column.name) it will give you the unique values in that column and the frequencies.
you should be able to simply use a tapply across all the columns to get what you want.
Upvotes: 0
Reputation: 61983
table
sounds like what you want. It will give you the number of occurrences of each value. To easily apply table to each column we can just use lapply
lapply(your_data, table)
# Example use and output
lapply(mtcars, table)
Upvotes: 5